Message ID | 20171020103840.32762-12-wei.liu2@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Wei Liu writes ("[OSSTEST PATCH 11/16] ts-debian-fixup: use correct resume device"): > See code comment for explanation. ... > + # There might be stale entries in /etc/initramfs-tools/conf.d/resume > + # which get stored in the initramfs. That introduces delay in guest booting > + # which might cause tests to fail. Why might there be such stale entries ? > + # This is particularly prominent in systemd enabled releases when it tries > + # to scan for the inexistent device(s) for a long time. Why is this not a bug in systemd ? If there are workarounds in osstest for Debian, I usually want a bug in the BTS, and a suite limit on the workaround so that we poke Debian again in 2 years time... > + # Override that in kernel command line with the correct swap partition. > + $cfg =~ m/'phy:.+-swap,(xvda\d+),.*'/; > + $extra .= " resume=/dev/$1"; > + logm("change resume device to $1"); Alternatively, if indeed this is necessary and not due to bugs, perhaps it should be done by xen-tools ? Ian.
On Fri, Oct 20, 2017 at 12:05:55PM +0100, Ian Jackson wrote: > Wei Liu writes ("[OSSTEST PATCH 11/16] ts-debian-fixup: use correct resume device"): > > See code comment for explanation. > ... > > + # There might be stale entries in /etc/initramfs-tools/conf.d/resume > > + # which get stored in the initramfs. That introduces delay in guest booting > > + # which might cause tests to fail. > > Why might there be such stale entries ? The ramdisk is taken from the host, which contains that file. The resume device is going to point to the swap partition in the host. > > > + # This is particularly prominent in systemd enabled releases when it tries > > + # to scan for the inexistent device(s) for a long time. > > Why is this not a bug in systemd ? If there are workarounds in > osstest for Debian, I usually want a bug in the BTS, and a suite limit > on the workaround so that we poke Debian again in 2 years time... > It has always been like that since forever. It appears that stretch will try a lot harder than jessie (which only tries once). Now thinking about it I can't say for sure whether that's due to the new kernel, or systemd or both. > > + # Override that in kernel command line with the correct swap partition. > > + $cfg =~ m/'phy:.+-swap,(xvda\d+),.*'/; > > + $extra .= " resume=/dev/$1"; > > + logm("change resume device to $1"); > > Alternatively, if indeed this is necessary and not due to bugs, > perhaps it should be done by xen-tools ? > No, see the first paragraph. xen-tools can't be expected to modify the ramdisk.
Wei Liu writes ("Re: [OSSTEST PATCH 11/16] ts-debian-fixup: use correct resume device"): > On Fri, Oct 20, 2017 at 12:05:55PM +0100, Ian Jackson wrote: > > Wei Liu writes ("[OSSTEST PATCH 11/16] ts-debian-fixup: use correct resume device"): > > > See code comment for explanation. > > ... > > > + # There might be stale entries in /etc/initramfs-tools/conf.d/resume > > > + # which get stored in the initramfs. That introduces delay in guest booting > > > + # which might cause tests to fail. > > > > Why might there be such stale entries ? > > The ramdisk is taken from the host, which contains that file. The resume > device is going to point to the swap partition in the host. Ah. Hrm. I guess this was always a bit of a bodge. > > > + # Override that in kernel command line with the correct swap partition. > > > + $cfg =~ m/'phy:.+-swap,(xvda\d+),.*'/; > > > + $extra .= " resume=/dev/$1"; > > > + logm("change resume device to $1"); > > > > Alternatively, if indeed this is necessary and not due to bugs, > > perhaps it should be done by xen-tools ? > > No, see the first paragraph. xen-tools can't be expected to modify the > ramdisk. Maybe it should set the appropriate "extra" though. IIRC xen-tools does implement this sort-of-trick of reusing the hosts initrd. So why does this bug not happen with other users of xen-tools ? Ian.
On Fri, Oct 20, 2017 at 01:35:41PM +0100, Ian Jackson wrote: > Wei Liu writes ("Re: [OSSTEST PATCH 11/16] ts-debian-fixup: use correct resume device"): > > On Fri, Oct 20, 2017 at 12:05:55PM +0100, Ian Jackson wrote: > > > Wei Liu writes ("[OSSTEST PATCH 11/16] ts-debian-fixup: use correct resume device"): > > > > See code comment for explanation. > > > ... > > > > + # There might be stale entries in /etc/initramfs-tools/conf.d/resume > > > > + # which get stored in the initramfs. That introduces delay in guest booting > > > > + # which might cause tests to fail. > > > > > > Why might there be such stale entries ? > > > > The ramdisk is taken from the host, which contains that file. The resume > > device is going to point to the swap partition in the host. > > Ah. Hrm. I guess this was always a bit of a bodge. > > > > > + # Override that in kernel command line with the correct swap partition. > > > > + $cfg =~ m/'phy:.+-swap,(xvda\d+),.*'/; > > > > + $extra .= " resume=/dev/$1"; > > > > + logm("change resume device to $1"); > > > > > > Alternatively, if indeed this is necessary and not due to bugs, > > > perhaps it should be done by xen-tools ? > > > > No, see the first paragraph. xen-tools can't be expected to modify the > > ramdisk. > > Maybe it should set the appropriate "extra" though. IIRC xen-tools > does implement this sort-of-trick of reusing the hosts initrd. So why > does this bug not happen with other users of xen-tools ? > I wouldn't call this behaviour a bug. Most people won't notice it because the guest will boot eventually. It is not like this will cause the guest to crash. Osstest sets a timeout. We can bump the timeout but that's worse than the solution here. There is no option in xen-create-image to manipulate extra=. The only viable solution is to provide a new template. That seems overkill and would require more code.
Wei Liu writes ("Re: [OSSTEST PATCH 11/16] ts-debian-fixup: use correct resume device"): > I wouldn't call this behaviour a bug. Most people won't notice it > because the guest will boot eventually. It is not like this will cause > the guest to crash. I don't think the guest booting "eventually" after some timeout is correct. > There is no option in xen-create-image to manipulate extra=. The only > viable solution is to provide a new template. That seems overkill and > would require more code. What you're saying is that this infelicity in xen-tools is not particularly easy to fix. That doesn't mean it shouldn't be reported as a bug. Thanks, Ian.
On Fri, Oct 20, 2017 at 02:20:29PM +0100, Ian Jackson wrote: > Wei Liu writes ("Re: [OSSTEST PATCH 11/16] ts-debian-fixup: use correct resume device"): > > I wouldn't call this behaviour a bug. Most people won't notice it > > because the guest will boot eventually. It is not like this will cause > > the guest to crash. > > I don't think the guest booting "eventually" after some timeout is > correct. > The OS is honoring whatever is written in the resume file. That's the correct behaviour to me. Booting after trying is better than not booting at all. That parameter isn't vital to a functional system. This has always been the case since forever. It is only the "timeout" in the past is shorter so osstest never noticed it. What is not correct is the content of the file. We can't blame the guest OS for that. > > There is no option in xen-create-image to manipulate extra=. The only > > viable solution is to provide a new template. That seems overkill and > > would require more code. > > What you're saying is that this infelicity in xen-tools is not > particularly easy to fix. That doesn't mean it shouldn't be reported > as a bug. There is nothing to fix. System administrator should provide a proper template -- that's what xen-tools developers most likely to say. It would require a lot more code in osstest to fix the template, and the essence of such patch would still be the same.
Wei Liu writes ("Re: [OSSTEST PATCH 11/16] ts-debian-fixup: use correct resume device"): > On Fri, Oct 20, 2017 at 02:20:29PM +0100, Ian Jackson wrote: > > I don't think the guest booting "eventually" after some timeout is > > correct. > > The OS is honoring whatever is written in the resume file. That's the > correct behaviour to me. > > Booting after trying is better than not booting at all. That parameter > isn't vital to a functional system. This has always been the case since > forever. It is only the "timeout" in the past is shorter so osstest > never noticed it. > > What is not correct is the content of the file. We can't blame the guest > OS for that. I agree with all of this. But, the overall behaviour is not correct. Even though all of the things that osstest specified to xen-create-image were correct. Therefore there is a bug. > > > There is no option in xen-create-image to manipulate extra=. The only > > > viable solution is to provide a new template. That seems overkill and > > > would require more code. > > > > What you're saying is that this infelicity in xen-tools is not > > particularly easy to fix. That doesn't mean it shouldn't be reported > > as a bug. > > There is nothing to fix. System administrator should provide a proper > template -- that's what xen-tools developers most likely to say. I'm not sure what you mean here. We are using the config template provided by xen-tools. The reuse of the host's initrd is suggested by the xen-create-image documentation. Ian.
On Fri, Oct 20, 2017 at 02:42:53PM +0100, Ian Jackson wrote: > Wei Liu writes ("Re: [OSSTEST PATCH 11/16] ts-debian-fixup: use correct resume device"): > > On Fri, Oct 20, 2017 at 02:20:29PM +0100, Ian Jackson wrote: > > > I don't think the guest booting "eventually" after some timeout is > > > correct. > > > > The OS is honoring whatever is written in the resume file. That's the > > correct behaviour to me. > > > > Booting after trying is better than not booting at all. That parameter > > isn't vital to a functional system. This has always been the case since > > forever. It is only the "timeout" in the past is shorter so osstest > > never noticed it. > > > > What is not correct is the content of the file. We can't blame the guest > > OS for that. > > I agree with all of this. But, the overall behaviour is not correct. > Even though all of the things that osstest specified to > xen-create-image were correct. Therefore there is a bug. > > > > > There is no option in xen-create-image to manipulate extra=. The only > > > > viable solution is to provide a new template. That seems overkill and > > > > would require more code. > > > > > > What you're saying is that this infelicity in xen-tools is not > > > particularly easy to fix. That doesn't mean it shouldn't be reported > > > as a bug. > > > > There is nothing to fix. System administrator should provide a proper > > template -- that's what xen-tools developers most likely to say. > > I'm not sure what you mean here. We are using the config template > provided by xen-tools. The reuse of the host's initrd is suggested by > the xen-create-image documentation. > TBH I don't know what to write in a report. What is the expected behaviour?
Wei Liu writes ("Re: [OSSTEST PATCH 11/16] ts-debian-fixup: use correct resume device"): > On Fri, Oct 20, 2017 at 02:42:53PM +0100, Ian Jackson wrote: > > Wei Liu writes ("Re: [OSSTEST PATCH 11/16] ts-debian-fixup: use correct resume device"): > > I'm not sure what you mean here. We are using the config template > > provided by xen-tools. The reuse of the host's initrd is suggested by > > the xen-create-image documentation. > > TBH I don't know what to write in a report. What is the expected > behaviour? How about this ? Adjust to taste, correcting any lies/misunderstandings, maybe adding references, etc. Steps to reproduce: Follow instructions in xen-create-image, including suggestion to re-use host's initrd, to make a guest running stretch. Try to boot it. Expected behaviour: Guest boots promptly. Observed behaviour: Guest pauses for a long time during boot waiting for the "resume device" to become available. Analysis: This is because the initrd has the host's swap device configured. With stretch, and systemd, the delay is longer; earler Debian releases had a similar issue but the shorter timeout meant it was a less serious problem. Suggested remedy: It is unfortunate that the initrd has the resume device baked into it. Normally this kind of thing (eg root=) comes via the command line. So maybe the bug is in the initrd generation. Alternatively, maybe xen-create-image should write a setting "extra" into the domain config file "resume=<swap device>". Ian.
On Fri, Oct 20, 2017 at 04:10:16PM +0100, Ian Jackson wrote: > Wei Liu writes ("Re: [OSSTEST PATCH 11/16] ts-debian-fixup: use correct resume device"): > > On Fri, Oct 20, 2017 at 02:42:53PM +0100, Ian Jackson wrote: > > > Wei Liu writes ("Re: [OSSTEST PATCH 11/16] ts-debian-fixup: use correct resume device"): > > > I'm not sure what you mean here. We are using the config template > > > provided by xen-tools. The reuse of the host's initrd is suggested by > > > the xen-create-image documentation. > > > > TBH I don't know what to write in a report. What is the expected > > behaviour? > > How about this ? Adjust to taste, correcting any > lies/misunderstandings, maybe adding references, etc. Actually, there is already a Debian bug. See #784810.
diff --git a/ts-debian-fixup b/ts-debian-fixup index d476467..fe37c30 100755 --- a/ts-debian-fixup +++ b/ts-debian-fixup @@ -176,6 +176,18 @@ sub otherfixupcfg () { } + # There might be stale entries in /etc/initramfs-tools/conf.d/resume + # which get stored in the initramfs. That introduces delay in guest booting + # which might cause tests to fail. + # + # This is particularly prominent in systemd enabled releases when it tries + # to scan for the inexistent device(s) for a long time. + # + # Override that in kernel command line with the correct swap partition. + $cfg =~ m/'phy:.+-swap,(xvda\d+),.*'/; + $extra .= " resume=/dev/$1"; + logm("change resume device to $1"); + $cfg =~ s/^extra.*//mg; $cfg .= "\nextra='$extra'\n"; };
See code comment for explanation. Signed-off-by: Wei Liu <wei.liu2@citrix.com> --- ts-debian-fixup | 12 ++++++++++++ 1 file changed, 12 insertions(+)