Message ID | 20200610154923.27510-5-mcgrof@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kmod/umh: a few fixes | expand |
Jens Markwardt reported a regression in the linux-next runs. with "umh: fix processed error when UMH_WAIT_PROC is used" (from linux-next) a linux bridge with an KVM guests no longer activates : without patch # ip addr show dev virbr1 6: virbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1 valid_lft forever preferred_lft forever with this patch the bridge stays DOWN with NO-CARRIER # ip addr show dev virbr1 6: virbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1 valid_lft forever preferred_lft forever This was bisected in linux-next. Reverting from linux-next also fixes the issue. Any idea? Christian
On 23.06.20 16:11, Christian Borntraeger wrote: > Jens Markwardt reported a regression in the linux-next runs. with "umh: fix > processed error when UMH_WAIT_PROC is used" (from linux-next) a linux bridge > with an KVM guests no longer activates : > > without patch > # ip addr show dev virbr1 > 6: virbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 > link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff > inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1 > valid_lft forever preferred_lft forever > > with this patch the bridge stays DOWN with NO-CARRIER > > # ip addr show dev virbr1 > 6: virbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 > link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff > inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1 > valid_lft forever preferred_lft forever > > This was bisected in linux-next. Reverting from linux-next also fixes the issue. > > Any idea? FWIW, s390 is big endian. Maybe some of the shifts inn the __KW* macros are wrong.
On 23.06.20 16:23, Christian Borntraeger wrote: > > > On 23.06.20 16:11, Christian Borntraeger wrote: >> Jens Markwardt reported a regression in the linux-next runs. with "umh: fix >> processed error when UMH_WAIT_PROC is used" (from linux-next) a linux bridge >> with an KVM guests no longer activates : >> >> without patch >> # ip addr show dev virbr1 >> 6: virbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 >> link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff >> inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1 >> valid_lft forever preferred_lft forever >> >> with this patch the bridge stays DOWN with NO-CARRIER >> >> # ip addr show dev virbr1 >> 6: virbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 >> link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff >> inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1 >> valid_lft forever preferred_lft forever >> >> This was bisected in linux-next. Reverting from linux-next also fixes the issue. >> >> Any idea? > > FWIW, s390 is big endian. Maybe some of the shifts inn the __KW* macros are wrong. Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the linux-bridge on s390?
On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote: > > > On 23.06.20 16:23, Christian Borntraeger wrote: > > > > > > On 23.06.20 16:11, Christian Borntraeger wrote: > >> Jens Markwardt reported a regression in the linux-next runs. with "umh: fix > >> processed error when UMH_WAIT_PROC is used" (from linux-next) a linux bridge > >> with an KVM guests no longer activates : > >> > >> without patch > >> # ip addr show dev virbr1 > >> 6: virbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 > >> link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff > >> inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1 > >> valid_lft forever preferred_lft forever > >> > >> with this patch the bridge stays DOWN with NO-CARRIER > >> > >> # ip addr show dev virbr1 > >> 6: virbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 > >> link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff > >> inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1 > >> valid_lft forever preferred_lft forever > >> > >> This was bisected in linux-next. Reverting from linux-next also fixes the issue. > >> > >> Any idea? > > > > FWIW, s390 is big endian. Maybe some of the shifts inn the __KW* macros are wrong. > > Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the > linux-bridge on s390? glibc for instance defines __WEXITSTATUS in only one location: bits/waitstatus.h and it does not special case it per architecture, so at this point I'd have to say we have to look somewhere else for why this is happening. The commmit which caused this is issuing a correct error code down the pipeline, nothing more. I'll make taking a look at this a priority right now. Let us see what I come up with today. Luis
Martin, your eyeballs would be appreciated for a bit on this. On Wed, Jun 24, 2020 at 12:05:46PM +0000, Luis Chamberlain wrote: > On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote: > > > > > > On 23.06.20 16:23, Christian Borntraeger wrote: > > > > > > > > > On 23.06.20 16:11, Christian Borntraeger wrote: > > >> Jens Markwardt reported a regression in the linux-next runs. with "umh: fix > > >> processed error when UMH_WAIT_PROC is used" (from linux-next) a linux bridge > > >> with an KVM guests no longer activates : > > >> > > >> without patch > > >> # ip addr show dev virbr1 > > >> 6: virbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 > > >> link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff > > >> inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1 > > >> valid_lft forever preferred_lft forever > > >> > > >> with this patch the bridge stays DOWN with NO-CARRIER > > >> > > >> # ip addr show dev virbr1 > > >> 6: virbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 > > >> link/ether 52:54:00:1e:3f:c0 brd ff:ff:ff:ff:ff:ff > > >> inet 192.168.254.254/24 brd 192.168.254.255 scope global virbr1 > > >> valid_lft forever preferred_lft forever > > >> > > >> This was bisected in linux-next. Reverting from linux-next also fixes the issue. > > >> > > >> Any idea? > > > > > > FWIW, s390 is big endian. Maybe some of the shifts inn the __KW* macros are wrong. > > > > Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the > > linux-bridge on s390? > > glibc for instance defines __WEXITSTATUS in only one location: bits/waitstatus.h > and it does not special case it per architecture, so at this point I'd > have to say we have to look somewhere else for why this is happening. I found however an LTP bug indicating the need to test for s390 wait macros [0] in light of a recent bug in glibc for s390. I am asking for references to that issue given I cannot find any mention of this on glibc yet. I'm in hopes Martin might be aware of that mentioned s390 glic bug. [0] https://github.com/linux-test-project/ltp/issues/605 Luis
On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote: > Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the > linux-bridge on s390? Are we even sure this is s390 specific and doesn't happen on other architectures with the same bridge setup?
On 24.06.20 16:43, Christoph Hellwig wrote: > On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote: >> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the >> linux-bridge on s390? > > Are we even sure this is s390 specific and doesn't happen on other > architectures with the same bridge setup? Fair point. AFAIK nobody has tested this yet on x86.
On Wed, Jun 24, 2020 at 05:54:46PM +0200, Christian Borntraeger wrote: > > > On 24.06.20 16:43, Christoph Hellwig wrote: > > On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote: > >> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the > >> linux-bridge on s390? > > > > Are we even sure this is s390 specific and doesn't happen on other > > architectures with the same bridge setup? > > Fair point. AFAIK nobody has tested this yet on x86. Regardless, can you enable dynamic debug prints, to see if the kernel reveals anything on the bridge code which may be relevant: echo "file net/bridge/* +p" > /sys/kernel/debug/dynamic_debug/control Luis
On Wed, Jun 24, 2020 at 01:17:25PM +0000, Luis Chamberlain wrote: > I found however an LTP bug indicating the need to test for > s390 wait macros [0] in light of a recent bug in glibc for s390. > I am asking for references to that issue given I cannot find > any mention of this on glibc yet. > > [0] https://github.com/linux-test-project/ltp/issues/605 I looked into this and the bug associated was: https://sourceware.org/bugzilla/show_bug.cgi?id=19613 The commit in question was upstream glibc commit b49ab5f4503f36dcbf43f821f817da66b2931fe6 ("Remove union wait [BZ #19613]"), and while I don't see anything s390 mentioned there, the issue there was due to the caller of the wait using a long instead of an int for the return value. In other words, that'd not the droid we are looking for. So the issue is something else. Luis
On 24.06.20 18:09, Luis Chamberlain wrote: > On Wed, Jun 24, 2020 at 05:54:46PM +0200, Christian Borntraeger wrote: >> >> >> On 24.06.20 16:43, Christoph Hellwig wrote: >>> On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote: >>>> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the >>>> linux-bridge on s390? >>> >>> Are we even sure this is s390 specific and doesn't happen on other >>> architectures with the same bridge setup? >> >> Fair point. AFAIK nobody has tested this yet on x86. > > Regardless, can you enable dynamic debug prints, to see if the kernel > reveals anything on the bridge code which may be relevant: > > echo "file net/bridge/* +p" > /sys/kernel/debug/dynamic_debug/control > > Luis When I start a guest the following happens with the patch: [ 47.420237] virbr0: port 2(vnet0) entered blocking state [ 47.420242] virbr0: port 2(vnet0) entered disabled state [ 47.420315] device vnet0 entered promiscuous mode [ 47.420365] virbr0: port 2(vnet0) event 16 [ 47.420366] virbr0: br_fill_info event 16 port vnet0 master virbr0 [ 47.420373] virbr0: toggle option: 12 state: 0 -> 0 [ 47.420536] virbr0: port 2(vnet0) entered blocking state [ 47.420538] virbr0: port 2(vnet0) event 16 [ 47.420539] virbr0: br_fill_info event 16 port vnet0 master virbr0 and the nothing happens. without the patch [ 33.805410] virbr0: hello timer expired [ 35.805413] virbr0: hello timer expired [ 36.184349] virbr0: port 2(vnet0) entered blocking state [ 36.184353] virbr0: port 2(vnet0) entered disabled state [ 36.184427] device vnet0 entered promiscuous mode [ 36.184479] virbr0: port 2(vnet0) event 16 [ 36.184480] virbr0: br_fill_info event 16 port vnet0 master virbr0 [ 36.184487] virbr0: toggle option: 12 state: 0 -> 0 [ 36.184636] virbr0: port 2(vnet0) entered blocking state [ 36.184638] virbr0: port 2(vnet0) entered listening state [ 36.184639] virbr0: port 2(vnet0) event 16 [ 36.184640] virbr0: br_fill_info event 16 port vnet0 master virbr0 [ 36.184645] virbr0: port 2(vnet0) event 16 [ 36.184646] virbr0: br_fill_info event 16 port vnet0 master virbr0 [ 37.805478] virbr0: hello timer expired [ 38.205413] virbr0: port 2(vnet0) forward delay timer [ 38.205414] virbr0: port 2(vnet0) entered learning state [ 38.205427] virbr0: port 2(vnet0) event 16 [ 38.205430] virbr0: br_fill_info event 16 port vnet0 master virbr0 [ 38.765414] virbr0: port 2(vnet0) hold timer expired [ 39.805415] virbr0: hello timer expired [ 40.285410] virbr0: port 2(vnet0) forward delay timer [ 40.285411] virbr0: port 2(vnet0) entered forwarding state [ 40.285418] virbr0: topology change detected, propagating [ 40.285420] virbr0: decreasing ageing time to 400 [ 40.285427] virbr0: port 2(vnet0) event 16 [ 40.285432] virbr0: br_fill_info event 16 port vnet0 master virbr0 [ 40.765408] virbr0: port 2(vnet0) hold timer expired [ 41.805415] virbr0: hello timer expired [ 42.765426] virbr0: port 2(vnet0) hold timer expired [ 43.805425] virbr0: hello timer expired [ 44.765426] virbr0: port 2(vnet0) hold timer expired [ 45.805418] virbr0: hello timer expired and continuing....
On 24.06.20 19:58, Christian Borntraeger wrote: > > > On 24.06.20 18:09, Luis Chamberlain wrote: >> On Wed, Jun 24, 2020 at 05:54:46PM +0200, Christian Borntraeger wrote: >>> >>> >>> On 24.06.20 16:43, Christoph Hellwig wrote: >>>> On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote: >>>>> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the >>>>> linux-bridge on s390? >>>> >>>> Are we even sure this is s390 specific and doesn't happen on other >>>> architectures with the same bridge setup? >>> >>> Fair point. AFAIK nobody has tested this yet on x86. >> >> Regardless, can you enable dynamic debug prints, to see if the kernel >> reveals anything on the bridge code which may be relevant: >> >> echo "file net/bridge/* +p" > /sys/kernel/debug/dynamic_debug/control >> >> Luis > > When I start a guest the following happens with the patch: > > [ 47.420237] virbr0: port 2(vnet0) entered blocking state > [ 47.420242] virbr0: port 2(vnet0) entered disabled state > [ 47.420315] device vnet0 entered promiscuous mode > [ 47.420365] virbr0: port 2(vnet0) event 16 > [ 47.420366] virbr0: br_fill_info event 16 port vnet0 master virbr0 > [ 47.420373] virbr0: toggle option: 12 state: 0 -> 0 > [ 47.420536] virbr0: port 2(vnet0) entered blocking state > [ 47.420538] virbr0: port 2(vnet0) event 16 > [ 47.420539] virbr0: br_fill_info event 16 port vnet0 master virbr0 > > and the nothing happens. > > > without the patch > [ 33.805410] virbr0: hello timer expired > [ 35.805413] virbr0: hello timer expired > [ 36.184349] virbr0: port 2(vnet0) entered blocking state > [ 36.184353] virbr0: port 2(vnet0) entered disabled state > [ 36.184427] device vnet0 entered promiscuous mode > [ 36.184479] virbr0: port 2(vnet0) event 16 > [ 36.184480] virbr0: br_fill_info event 16 port vnet0 master virbr0 > [ 36.184487] virbr0: toggle option: 12 state: 0 -> 0 > [ 36.184636] virbr0: port 2(vnet0) entered blocking state > [ 36.184638] virbr0: port 2(vnet0) entered listening state > [ 36.184639] virbr0: port 2(vnet0) event 16 > [ 36.184640] virbr0: br_fill_info event 16 port vnet0 master virbr0 > [ 36.184645] virbr0: port 2(vnet0) event 16 > [ 36.184646] virbr0: br_fill_info event 16 port vnet0 master virbr0 > [ 37.805478] virbr0: hello timer expired > [ 38.205413] virbr0: port 2(vnet0) forward delay timer > [ 38.205414] virbr0: port 2(vnet0) entered learning state > [ 38.205427] virbr0: port 2(vnet0) event 16 > [ 38.205430] virbr0: br_fill_info event 16 port vnet0 master virbr0 > [ 38.765414] virbr0: port 2(vnet0) hold timer expired > [ 39.805415] virbr0: hello timer expired > [ 40.285410] virbr0: port 2(vnet0) forward delay timer > [ 40.285411] virbr0: port 2(vnet0) entered forwarding state > [ 40.285418] virbr0: topology change detected, propagating > [ 40.285420] virbr0: decreasing ageing time to 400 > [ 40.285427] virbr0: port 2(vnet0) event 16 > [ 40.285432] virbr0: br_fill_info event 16 port vnet0 master virbr0 > [ 40.765408] virbr0: port 2(vnet0) hold timer expired > [ 41.805415] virbr0: hello timer expired > [ 42.765426] virbr0: port 2(vnet0) hold timer expired > [ 43.805425] virbr0: hello timer expired > [ 44.765426] virbr0: port 2(vnet0) hold timer expired > [ 45.805418] virbr0: hello timer expired > > and continuing.... Just reverting the umh.c parts like this makes the problem go away. diff --git a/kernel/umh.c b/kernel/umh.c index f81e8698e36e..79f139a7ca03 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -154,8 +154,8 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) * the real error code is already in sub_info->retval or * sub_info->retval is 0 anyway, so don't mess with it then. */ - if (KWIFEXITED(ret)) - sub_info->retval = KWEXITSTATUS(ret); + if (ret) + sub_info->retval = ret; } /* Restore default kernel sig handler */
On 24.06.20 20:09, Christian Borntraeger wrote: > > > On 24.06.20 19:58, Christian Borntraeger wrote: >> >> >> On 24.06.20 18:09, Luis Chamberlain wrote: >>> On Wed, Jun 24, 2020 at 05:54:46PM +0200, Christian Borntraeger wrote: >>>> >>>> >>>> On 24.06.20 16:43, Christoph Hellwig wrote: >>>>> On Wed, Jun 24, 2020 at 01:11:54PM +0200, Christian Borntraeger wrote: >>>>>> Does anyone have an idea why "umh: fix processed error when UMH_WAIT_PROC is used" breaks the >>>>>> linux-bridge on s390? >>>>> >>>>> Are we even sure this is s390 specific and doesn't happen on other >>>>> architectures with the same bridge setup? >>>> >>>> Fair point. AFAIK nobody has tested this yet on x86. >>> >>> Regardless, can you enable dynamic debug prints, to see if the kernel >>> reveals anything on the bridge code which may be relevant: >>> >>> echo "file net/bridge/* +p" > /sys/kernel/debug/dynamic_debug/control >>> >>> Luis >> >> When I start a guest the following happens with the patch: >> >> [ 47.420237] virbr0: port 2(vnet0) entered blocking state >> [ 47.420242] virbr0: port 2(vnet0) entered disabled state >> [ 47.420315] device vnet0 entered promiscuous mode >> [ 47.420365] virbr0: port 2(vnet0) event 16 >> [ 47.420366] virbr0: br_fill_info event 16 port vnet0 master virbr0 >> [ 47.420373] virbr0: toggle option: 12 state: 0 -> 0 >> [ 47.420536] virbr0: port 2(vnet0) entered blocking state >> [ 47.420538] virbr0: port 2(vnet0) event 16 >> [ 47.420539] virbr0: br_fill_info event 16 port vnet0 master virbr0 >> >> and the nothing happens. >> >> >> without the patch >> [ 33.805410] virbr0: hello timer expired >> [ 35.805413] virbr0: hello timer expired >> [ 36.184349] virbr0: port 2(vnet0) entered blocking state >> [ 36.184353] virbr0: port 2(vnet0) entered disabled state >> [ 36.184427] device vnet0 entered promiscuous mode >> [ 36.184479] virbr0: port 2(vnet0) event 16 >> [ 36.184480] virbr0: br_fill_info event 16 port vnet0 master virbr0 >> [ 36.184487] virbr0: toggle option: 12 state: 0 -> 0 >> [ 36.184636] virbr0: port 2(vnet0) entered blocking state >> [ 36.184638] virbr0: port 2(vnet0) entered listening state >> [ 36.184639] virbr0: port 2(vnet0) event 16 >> [ 36.184640] virbr0: br_fill_info event 16 port vnet0 master virbr0 >> [ 36.184645] virbr0: port 2(vnet0) event 16 >> [ 36.184646] virbr0: br_fill_info event 16 port vnet0 master virbr0 >> [ 37.805478] virbr0: hello timer expired >> [ 38.205413] virbr0: port 2(vnet0) forward delay timer >> [ 38.205414] virbr0: port 2(vnet0) entered learning state >> [ 38.205427] virbr0: port 2(vnet0) event 16 >> [ 38.205430] virbr0: br_fill_info event 16 port vnet0 master virbr0 >> [ 38.765414] virbr0: port 2(vnet0) hold timer expired >> [ 39.805415] virbr0: hello timer expired >> [ 40.285410] virbr0: port 2(vnet0) forward delay timer >> [ 40.285411] virbr0: port 2(vnet0) entered forwarding state >> [ 40.285418] virbr0: topology change detected, propagating >> [ 40.285420] virbr0: decreasing ageing time to 400 >> [ 40.285427] virbr0: port 2(vnet0) event 16 >> [ 40.285432] virbr0: br_fill_info event 16 port vnet0 master virbr0 >> [ 40.765408] virbr0: port 2(vnet0) hold timer expired >> [ 41.805415] virbr0: hello timer expired >> [ 42.765426] virbr0: port 2(vnet0) hold timer expired >> [ 43.805425] virbr0: hello timer expired >> [ 44.765426] virbr0: port 2(vnet0) hold timer expired >> [ 45.805418] virbr0: hello timer expired >> >> and continuing.... > > Just reverting the umh.c parts like this makes the problem go away. > > diff --git a/kernel/umh.c b/kernel/umh.c > index f81e8698e36e..79f139a7ca03 100644 > --- a/kernel/umh.c > +++ b/kernel/umh.c > @@ -154,8 +154,8 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) > * the real error code is already in sub_info->retval or > * sub_info->retval is 0 anyway, so don't mess with it then. > */ > - if (KWIFEXITED(ret)) > - sub_info->retval = KWEXITSTATUS(ret); > + if (ret) > + sub_info->retval = ret; > } > > /* Restore default kernel sig handler */ I instrumented this: [ 5.118528] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 [ 9.409235] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 10.114914] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 [ 10.116308] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 [ 10.117690] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 [ 10.118732] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 [ 10.119899] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 10.121365] ret=0x100 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x1 [ 10.128044] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 10.945814] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 10.962799] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 10.983755] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 10.992705] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.118877] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.124359] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.129364] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.142139] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.148385] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.153686] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.158861] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.164068] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.192445] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.200605] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.208493] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.216612] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.226467] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.525363] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.532758] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.535279] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.697660] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.701634] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.818652] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 11.836228] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 12.082266] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 [ 40.965898] ret=0x0 KWIFEXITED(ret) = 0x1 KWEXITSTATUS(ret)= 0x0 So the translations look correct. But your change is actually a sematic change if(ret) will only trigger if there is an error if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD and we did not do it before.
On 24.06.20 20:32, Christian Borntraeger wrote: [...]> > So the translations look correct. But your change is actually a sematic change > if(ret) will only trigger if there is an error > if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD > and we did not do it before. > So the right fix is diff --git a/kernel/umh.c b/kernel/umh.c index f81e8698e36e..a3a3196e84d1 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) * the real error code is already in sub_info->retval or * sub_info->retval is 0 anyway, so don't mess with it then. */ - if (KWIFEXITED(ret)) + if (KWEXITSTATUS(ret)) sub_info->retval = KWEXITSTATUS(ret); } I think.
On 24.06.20 20:37, Christian Borntraeger wrote: > > > On 24.06.20 20:32, Christian Borntraeger wrote: > [...]> >> So the translations look correct. But your change is actually a sematic change >> if(ret) will only trigger if there is an error >> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD >> and we did not do it before. >> > > So the right fix is > > diff --git a/kernel/umh.c b/kernel/umh.c > index f81e8698e36e..a3a3196e84d1 100644 > --- a/kernel/umh.c > +++ b/kernel/umh.c > @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) > * the real error code is already in sub_info->retval or > * sub_info->retval is 0 anyway, so don't mess with it then. > */ > - if (KWIFEXITED(ret)) > + if (KWEXITSTATUS(ret)) > sub_info->retval = KWEXITSTATUS(ret); > } Ping. Shall I send this as a proper patch or are we merging this fixup in Andrews patch queue?
On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote: > > > On 24.06.20 20:32, Christian Borntraeger wrote: > [...]> > > So the translations look correct. But your change is actually a sematic change > > if(ret) will only trigger if there is an error > > if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD > > and we did not do it before. > > > > So the right fix is > > diff --git a/kernel/umh.c b/kernel/umh.c > index f81e8698e36e..a3a3196e84d1 100644 > --- a/kernel/umh.c > +++ b/kernel/umh.c > @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) > * the real error code is already in sub_info->retval or > * sub_info->retval is 0 anyway, so don't mess with it then. > */ > - if (KWIFEXITED(ret)) > + if (KWEXITSTATUS(ret)) > sub_info->retval = KWEXITSTATUS(ret); > } > > I think. Nope, the right form is to check for WIFEXITED() before using WEXITSTATUS(). I'm not able to reproduce this on x86 with a bridge. What type of bridge are you using on a guest, or did you mean using KVM so that the *host* can spawn kvm guests? It would be good if you can try to add a bridge manually and see where things fail. Can you do something like this: brctl addbr br0 brctl addif br0 ens6 ip link set dev br0 up Note that most callers are for modprobe. I'd be curious to see which umh is failing which breaks bridge for you. Can you trut this so we can see which umh call is failing? diff --git a/kernel/umh.c b/kernel/umh.c index f81e8698e36e..5ad74bc301d8 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -2,6 +2,9 @@ /* * umh - the kernel usermode helper */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + #include <linux/module.h> #include <linux/sched.h> #include <linux/sched/task.h> @@ -154,8 +157,12 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) * the real error code is already in sub_info->retval or * sub_info->retval is 0 anyway, so don't mess with it then. */ - if (KWIFEXITED(ret)) + printk("== ret: %02x\n", ret); + printk("== KWIFEXITED(ret): %02x\n", KWIFEXITED(ret)); + if (KWIFEXITED(ret)) { + printk("KWEXITSTATUS(ret): %d\n", KWEXITSTATUS(ret)); sub_info->retval = KWEXITSTATUS(ret); + } } /* Restore default kernel sig handler */ @@ -383,6 +390,7 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, void *data) { struct subprocess_info *sub_info; + unsigned int i = 0; sub_info = kzalloc(sizeof(struct subprocess_info), gfp_mask); if (!sub_info) goto out; @@ -394,6 +402,11 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, #else sub_info->path = path; #endif + pr_info("sub_info->path: %s\n", sub_info->path); + while (argv[i]) + printk(KERN_INFO "%s ", argv[i++]); + printk(KERN_INFO "\n"); + sub_info->argv = argv; sub_info->envp = envp;
On 26.06.20 04:54, Luis Chamberlain wrote: > On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote: >> >> >> On 24.06.20 20:32, Christian Borntraeger wrote: >> [...]> >>> So the translations look correct. But your change is actually a sematic change >>> if(ret) will only trigger if there is an error >>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD >>> and we did not do it before. >>> >> >> So the right fix is >> >> diff --git a/kernel/umh.c b/kernel/umh.c >> index f81e8698e36e..a3a3196e84d1 100644 >> --- a/kernel/umh.c >> +++ b/kernel/umh.c >> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) >> * the real error code is already in sub_info->retval or >> * sub_info->retval is 0 anyway, so don't mess with it then. >> */ >> - if (KWIFEXITED(ret)) >> + if (KWEXITSTATUS(ret)) >> sub_info->retval = KWEXITSTATUS(ret); >> } >> >> I think. > > Nope, the right form is to check for WIFEXITED() before using WEXITSTATUS(). But this IS a change over the previous code, no? I will test next week as I am travelling right now.
On Fri, Jun 26, 2020 at 07:22:34AM +0200, Christian Borntraeger wrote: > > > On 26.06.20 04:54, Luis Chamberlain wrote: > > On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote: > >> > >> > >> On 24.06.20 20:32, Christian Borntraeger wrote: > >> [...]> > >>> So the translations look correct. But your change is actually a sematic change > >>> if(ret) will only trigger if there is an error > >>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD > >>> and we did not do it before. > >>> > >> > >> So the right fix is > >> > >> diff --git a/kernel/umh.c b/kernel/umh.c > >> index f81e8698e36e..a3a3196e84d1 100644 > >> --- a/kernel/umh.c > >> +++ b/kernel/umh.c > >> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) > >> * the real error code is already in sub_info->retval or > >> * sub_info->retval is 0 anyway, so don't mess with it then. > >> */ > >> - if (KWIFEXITED(ret)) > >> + if (KWEXITSTATUS(ret)) > >> sub_info->retval = KWEXITSTATUS(ret); > >> } > >> > >> I think. > > > > Nope, the right form is to check for WIFEXITED() before using WEXITSTATUS(). > > But this IS a change over the previous code, no? > I will test next week as I am travelling right now. I'm all for reverting back to the previous behavior. If someone wants a behavior change it should be a separate patch. And out of pure self interest I'd like to see that change after my addition of the kernel_wait helper to replace the kernel_wait4 abuse :)
On Fri, Jun 26, 2020 at 10:00:01AM +0100, Christoph Hellwig wrote: > On Fri, Jun 26, 2020 at 07:22:34AM +0200, Christian Borntraeger wrote: > > > > > > On 26.06.20 04:54, Luis Chamberlain wrote: > > > On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote: > > >> > > >> > > >> On 24.06.20 20:32, Christian Borntraeger wrote: > > >> [...]> > > >>> So the translations look correct. But your change is actually a sematic change > > >>> if(ret) will only trigger if there is an error > > >>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD > > >>> and we did not do it before. > > >>> > > >> > > >> So the right fix is > > >> > > >> diff --git a/kernel/umh.c b/kernel/umh.c > > >> index f81e8698e36e..a3a3196e84d1 100644 > > >> --- a/kernel/umh.c > > >> +++ b/kernel/umh.c > > >> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) > > >> * the real error code is already in sub_info->retval or > > >> * sub_info->retval is 0 anyway, so don't mess with it then. > > >> */ > > >> - if (KWIFEXITED(ret)) > > >> + if (KWEXITSTATUS(ret)) > > >> sub_info->retval = KWEXITSTATUS(ret); > > >> } > > >> > > >> I think. > > > > > > Nope, the right form is to check for WIFEXITED() before using WEXITSTATUS(). > > > > But this IS a change over the previous code, no? > > I will test next week as I am travelling right now. > > I'm all for reverting back to the previous behavior. If someone wants > a behavior change it should be a separate patch. And out of pure self > interest I'd like to see that change after my addition of the > kernel_wait helper to replace the kernel_wait4 abuse :) Yeah sure, this will be *slightly* cleaner after that. Today we implicitly return -ECHLD as well under the assumption the kernel_wait4() call failed. Andrew, can you please revert these two for now: selftests: simplify kmod failure value umh: fix processed error when UMH_WAIT_PROC is used Later, we'll add Christoph's simplier kernel wait, and make the change directly there to catpure the right error. That still won't fix this reported issue, but it will be cleaner and will go tested by Christian Borntraeger before. Luis
On Fri, Jun 26, 2020 at 11:40:08AM +0000, Luis Chamberlain wrote: > Andrew, can you please revert these two for now: > > selftests: simplify kmod failure value > umh: fix processed error when UMH_WAIT_PROC is used > > Later, we'll add Christoph's simplier kernel wait, and make the change > directly there to catpure the right error. That still won't fix this reported > issue, but it will be cleaner and will go tested by Christian Borntraeger > before. However, note that the only consideration to make here against this approach of the fix later going in with the simpler kernel wait is if this actually is fixing a real issue, and if we'd want this going to stable. We should for sure revert though, so Andrew please do proceed to revert or drop those two patches alone for now. It was unclear to me if this should go to stable given I was not sure how severe that NFS case mentioned on the commit log was, and no one on the NFS side has replied about that yet, however there may be other areas where code inspection of callsites was not sufficient to find the real critical areas. I'm now very curious what this issue that Christian with bridge on s390 found will be. Luis
On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote: > On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote: > > > > > > On 24.06.20 20:32, Christian Borntraeger wrote: > > [...]> > > > So the translations look correct. But your change is actually a sematic change > > > if(ret) will only trigger if there is an error > > > if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD > > > and we did not do it before. > > > > > > > So the right fix is > > > > diff --git a/kernel/umh.c b/kernel/umh.c > > index f81e8698e36e..a3a3196e84d1 100644 > > --- a/kernel/umh.c > > +++ b/kernel/umh.c > > @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) > > * the real error code is already in sub_info->retval or > > * sub_info->retval is 0 anyway, so don't mess with it then. > > */ > > - if (KWIFEXITED(ret)) > > + if (KWEXITSTATUS(ret)) > > sub_info->retval = KWEXITSTATUS(ret); > > } > > > > I think. > > Nope, the right form is to check for WIFEXITED() before using WEXITSTATUS(). > I'm not able to reproduce this on x86 with a bridge. What type of bridge > are you using on a guest, or did you mean using KVM so that the *host* > can spawn kvm guests? > > It would be good if you can try to add a bridge manually and see where > things fail. Can you do something like this: > > brctl addbr br0 > brctl addif br0 ens6 > ip link set dev br0 up > > Note that most callers are for modprobe. I'd be curious to see which > umh is failing which breaks bridge for you. Can you trut this so we can > see which umh call is failing? Christian, any luck getting to test the code below to see what this reveals? Luis > > diff --git a/kernel/umh.c b/kernel/umh.c > index f81e8698e36e..5ad74bc301d8 100644 > --- a/kernel/umh.c > +++ b/kernel/umh.c > @@ -2,6 +2,9 @@ > /* > * umh - the kernel usermode helper > */ > + > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > #include <linux/module.h> > #include <linux/sched.h> > #include <linux/sched/task.h> > @@ -154,8 +157,12 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) > * the real error code is already in sub_info->retval or > * sub_info->retval is 0 anyway, so don't mess with it then. > */ > - if (KWIFEXITED(ret)) > + printk("== ret: %02x\n", ret); > + printk("== KWIFEXITED(ret): %02x\n", KWIFEXITED(ret)); > + if (KWIFEXITED(ret)) { > + printk("KWEXITSTATUS(ret): %d\n", KWEXITSTATUS(ret)); > sub_info->retval = KWEXITSTATUS(ret); > + } > } > > /* Restore default kernel sig handler */ > @@ -383,6 +390,7 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, > void *data) > { > struct subprocess_info *sub_info; > + unsigned int i = 0; > sub_info = kzalloc(sizeof(struct subprocess_info), gfp_mask); > if (!sub_info) > goto out; > @@ -394,6 +402,11 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, > #else > sub_info->path = path; > #endif > + pr_info("sub_info->path: %s\n", sub_info->path); > + while (argv[i]) > + printk(KERN_INFO "%s ", argv[i++]); > + printk(KERN_INFO "\n"); > + > sub_info->argv = argv; > sub_info->envp = envp; > >
On 30.06.20 19:57, Luis Chamberlain wrote: > On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote: >> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote: >>> >>> >>> On 24.06.20 20:32, Christian Borntraeger wrote: >>> [...]> >>>> So the translations look correct. But your change is actually a sematic change >>>> if(ret) will only trigger if there is an error >>>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD >>>> and we did not do it before. >>>> >>> >>> So the right fix is >>> >>> diff --git a/kernel/umh.c b/kernel/umh.c >>> index f81e8698e36e..a3a3196e84d1 100644 >>> --- a/kernel/umh.c >>> +++ b/kernel/umh.c >>> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) >>> * the real error code is already in sub_info->retval or >>> * sub_info->retval is 0 anyway, so don't mess with it then. >>> */ >>> - if (KWIFEXITED(ret)) >>> + if (KWEXITSTATUS(ret)) >>> sub_info->retval = KWEXITSTATUS(ret); >>> } >>> >>> I think. >> >> Nope, the right form is to check for WIFEXITED() before using WEXITSTATUS(). >> I'm not able to reproduce this on x86 with a bridge. What type of bridge >> are you using on a guest, or did you mean using KVM so that the *host* >> can spawn kvm guests? >> >> It would be good if you can try to add a bridge manually and see where >> things fail. Can you do something like this: >> >> brctl addbr br0 >> brctl addif br0 ens6 >> ip link set dev br0 up >> >> Note that most callers are for modprobe. I'd be curious to see which >> umh is failing which breaks bridge for you. Can you trut this so we can >> see which umh call is failing? > > Christian, any luck getting to test the code below to see what this > reveals? > > Luis dmesg attached [ 0.137762] Linux version 5.8.0-rc2-next-20200623-00002-g5a5b173022ac-dirty (cborntra@m83lp52.lnxne.boe) (gcc (GCC) 9.3.1 20200317 (Red Hat 9.3.1-1), GNU ld version 2.32-31.fc31) #60 SMP Wed Jul 1 09:14:56 CEST 2020 [ 0.137764] setup: Linux is running natively in 64-bit mode [ 0.137812] setup: The maximum memory size is 131072MB [ 0.137817] setup: Reserving 1024MB of memory at 130048MB for crashkernel (System RAM: 130048MB) [ 0.137901] cpu: 64 configured CPUs, 0 standby CPUs [ 0.137966] cpu: The CPU configuration topology of the machine is: 0 0 4 2 3 10 / 4 [ 0.138533] Write protected kernel read-only data: 13660k [ 0.139239] Zone ranges: [ 0.139240] DMA [mem 0x0000000000000000-0x000000007fffffff] [ 0.139241] Normal [mem 0x0000000080000000-0x0000001fffffffff] [ 0.139243] Movable zone start for each node [ 0.139243] Early memory node ranges [ 0.139244] node 0: [mem 0x0000000000000000-0x0000001fffffffff] [ 0.139251] Initmem setup node 0 [mem 0x0000000000000000-0x0000001fffffffff] [ 0.139252] On node 0 totalpages: 33554432 [ 0.139253] DMA zone: 8192 pages used for memmap [ 0.139253] DMA zone: 0 pages reserved [ 0.139254] DMA zone: 524288 pages, LIFO batch:63 [ 0.152116] Normal zone: 516096 pages used for memmap [ 0.152117] Normal zone: 33030144 pages, LIFO batch:63 [ 0.168380] percpu: Embedded 33 pages/cpu s97536 r8192 d29440 u135168 [ 0.168388] pcpu-alloc: s97536 r8192 d29440 u135168 alloc=33*4096 [ 0.168388] pcpu-alloc: [0] 000 [0] 001 [0] 002 [0] 003 [ 0.168390] pcpu-alloc: [0] 004 [0] 005 [0] 006 [0] 007 [ 0.168391] pcpu-alloc: [0] 008 [0] 009 [0] 010 [0] 011 [ 0.168392] pcpu-alloc: [0] 012 [0] 013 [0] 014 [0] 015 [ 0.168393] pcpu-alloc: [0] 016 [0] 017 [0] 018 [0] 019 [ 0.168394] pcpu-alloc: [0] 020 [0] 021 [0] 022 [0] 023 [ 0.168395] pcpu-alloc: [0] 024 [0] 025 [0] 026 [0] 027 [ 0.168396] pcpu-alloc: [0] 028 [0] 029 [0] 030 [0] 031 [ 0.168397] pcpu-alloc: [0] 032 [0] 033 [0] 034 [0] 035 [ 0.168398] pcpu-alloc: [0] 036 [0] 037 [0] 038 [0] 039 [ 0.168399] pcpu-alloc: [0] 040 [0] 041 [0] 042 [0] 043 [ 0.168400] pcpu-alloc: [0] 044 [0] 045 [0] 046 [0] 047 [ 0.168401] pcpu-alloc: [0] 048 [0] 049 [0] 050 [0] 051 [ 0.168403] pcpu-alloc: [0] 052 [0] 053 [0] 054 [0] 055 [ 0.168404] pcpu-alloc: [0] 056 [0] 057 [0] 058 [0] 059 [ 0.168405] pcpu-alloc: [0] 060 [0] 061 [0] 062 [0] 063 [ 0.168406] pcpu-alloc: [0] 064 [0] 065 [0] 066 [0] 067 [ 0.168407] pcpu-alloc: [0] 068 [0] 069 [0] 070 [0] 071 [ 0.168408] pcpu-alloc: [0] 072 [0] 073 [0] 074 [0] 075 [ 0.168409] pcpu-alloc: [0] 076 [0] 077 [0] 078 [0] 079 [ 0.168410] pcpu-alloc: [0] 080 [0] 081 [0] 082 [0] 083 [ 0.168411] pcpu-alloc: [0] 084 [0] 085 [0] 086 [0] 087 [ 0.168413] pcpu-alloc: [0] 088 [0] 089 [0] 090 [0] 091 [ 0.168414] pcpu-alloc: [0] 092 [0] 093 [0] 094 [0] 095 [ 0.168415] pcpu-alloc: [0] 096 [0] 097 [0] 098 [0] 099 [ 0.168416] pcpu-alloc: [0] 100 [0] 101 [0] 102 [0] 103 [ 0.168417] pcpu-alloc: [0] 104 [0] 105 [0] 106 [0] 107 [ 0.168418] pcpu-alloc: [0] 108 [0] 109 [0] 110 [0] 111 [ 0.168419] pcpu-alloc: [0] 112 [0] 113 [0] 114 [0] 115 [ 0.168420] pcpu-alloc: [0] 116 [0] 117 [0] 118 [0] 119 [ 0.168421] pcpu-alloc: [0] 120 [0] 121 [0] 122 [0] 123 [ 0.168422] pcpu-alloc: [0] 124 [0] 125 [0] 126 [0] 127 [ 0.168424] pcpu-alloc: [0] 128 [0] 129 [0] 130 [0] 131 [ 0.168425] pcpu-alloc: [0] 132 [0] 133 [0] 134 [0] 135 [ 0.168426] pcpu-alloc: [0] 136 [0] 137 [0] 138 [0] 139 [ 0.168427] pcpu-alloc: [0] 140 [0] 141 [0] 142 [0] 143 [ 0.168428] pcpu-alloc: [0] 144 [0] 145 [0] 146 [0] 147 [ 0.168429] pcpu-alloc: [0] 148 [0] 149 [0] 150 [0] 151 [ 0.168430] pcpu-alloc: [0] 152 [0] 153 [0] 154 [0] 155 [ 0.168431] pcpu-alloc: [0] 156 [0] 157 [0] 158 [0] 159 [ 0.168432] pcpu-alloc: [0] 160 [0] 161 [0] 162 [0] 163 [ 0.168433] pcpu-alloc: [0] 164 [0] 165 [0] 166 [0] 167 [ 0.168434] pcpu-alloc: [0] 168 [0] 169 [0] 170 [0] 171 [ 0.168435] pcpu-alloc: [0] 172 [0] 173 [0] 174 [0] 175 [ 0.168436] pcpu-alloc: [0] 176 [0] 177 [0] 178 [0] 179 [ 0.168437] pcpu-alloc: [0] 180 [0] 181 [0] 182 [0] 183 [ 0.168438] pcpu-alloc: [0] 184 [0] 185 [0] 186 [0] 187 [ 0.168439] pcpu-alloc: [0] 188 [0] 189 [0] 190 [0] 191 [ 0.168440] pcpu-alloc: [0] 192 [0] 193 [0] 194 [0] 195 [ 0.168442] pcpu-alloc: [0] 196 [0] 197 [0] 198 [0] 199 [ 0.168443] pcpu-alloc: [0] 200 [0] 201 [0] 202 [0] 203 [ 0.168444] pcpu-alloc: [0] 204 [0] 205 [0] 206 [0] 207 [ 0.168445] pcpu-alloc: [0] 208 [0] 209 [0] 210 [0] 211 [ 0.168446] pcpu-alloc: [0] 212 [0] 213 [0] 214 [0] 215 [ 0.168447] pcpu-alloc: [0] 216 [0] 217 [0] 218 [0] 219 [ 0.168448] pcpu-alloc: [0] 220 [0] 221 [0] 222 [0] 223 [ 0.168449] pcpu-alloc: [0] 224 [0] 225 [0] 226 [0] 227 [ 0.168450] pcpu-alloc: [0] 228 [0] 229 [0] 230 [0] 231 [ 0.168451] pcpu-alloc: [0] 232 [0] 233 [0] 234 [0] 235 [ 0.168452] pcpu-alloc: [0] 236 [0] 237 [0] 238 [0] 239 [ 0.168453] pcpu-alloc: [0] 240 [0] 241 [0] 242 [0] 243 [ 0.168454] pcpu-alloc: [0] 244 [0] 245 [0] 246 [0] 247 [ 0.168455] pcpu-alloc: [0] 248 [0] 249 [0] 250 [0] 251 [ 0.168456] pcpu-alloc: [0] 252 [0] 253 [0] 254 [0] 255 [ 0.168457] pcpu-alloc: [0] 256 [0] 257 [0] 258 [0] 259 [ 0.168459] pcpu-alloc: [0] 260 [0] 261 [0] 262 [0] 263 [ 0.168460] pcpu-alloc: [0] 264 [0] 265 [0] 266 [0] 267 [ 0.168461] pcpu-alloc: [0] 268 [0] 269 [0] 270 [0] 271 [ 0.168462] pcpu-alloc: [0] 272 [0] 273 [0] 274 [0] 275 [ 0.168463] pcpu-alloc: [0] 276 [0] 277 [0] 278 [0] 279 [ 0.168464] pcpu-alloc: [0] 280 [0] 281 [0] 282 [0] 283 [ 0.168465] pcpu-alloc: [0] 284 [0] 285 [0] 286 [0] 287 [ 0.168466] pcpu-alloc: [0] 288 [0] 289 [0] 290 [0] 291 [ 0.168467] pcpu-alloc: [0] 292 [0] 293 [0] 294 [0] 295 [ 0.168468] pcpu-alloc: [0] 296 [0] 297 [0] 298 [0] 299 [ 0.168469] pcpu-alloc: [0] 300 [0] 301 [0] 302 [0] 303 [ 0.168470] pcpu-alloc: [0] 304 [0] 305 [0] 306 [0] 307 [ 0.168471] pcpu-alloc: [0] 308 [0] 309 [0] 310 [0] 311 [ 0.168472] pcpu-alloc: [0] 312 [0] 313 [0] 314 [0] 315 [ 0.168473] pcpu-alloc: [0] 316 [0] 317 [0] 318 [0] 319 [ 0.168474] pcpu-alloc: [0] 320 [0] 321 [0] 322 [0] 323 [ 0.168475] pcpu-alloc: [0] 324 [0] 325 [0] 326 [0] 327 [ 0.168476] pcpu-alloc: [0] 328 [0] 329 [0] 330 [0] 331 [ 0.168477] pcpu-alloc: [0] 332 [0] 333 [0] 334 [0] 335 [ 0.168478] pcpu-alloc: [0] 336 [0] 337 [0] 338 [0] 339 [ 0.168501] Built 1 zonelists, mobility grouping on. Total pages: 33030144 [ 0.168502] Policy zone: Normal [ 0.168503] Kernel command line: root=/dev/disk/by-path/ccw-0.0.3318-part1 rd.dasd=0.0.3318 cio_ignore=all,!condev rd.znet=qeth,0.0.bd00,0.0.bd01,0.0.bd02,layer2=1,portno=0,portname=OSAPORT zfcp.allow_lun_scan=0 BOOT_IMAGE=0 crashkernel=1G brd.rd_size=8000000 brd.rd_nr=8 BOOT_IMAGE=0 [ 0.169679] printk: log_buf_len individual max cpu contribution: 4096 bytes [ 0.169680] printk: log_buf_len total cpu_extra contributions: 1388544 bytes [ 0.169680] printk: log_buf_len min size: 131072 bytes [ 0.169973] printk: log_buf_len: 2097152 bytes [ 0.169974] printk: early log buf free: 123812(94%) [ 0.179095] Dentry cache hash table entries: 8388608 (order: 14, 67108864 bytes, linear) [ 0.183664] Inode-cache hash table entries: 4194304 (order: 13, 33554432 bytes, linear) [ 0.183681] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.212977] Memory: 2312420K/134217728K available (10512K kernel code, 2084K rwdata, 3148K rodata, 4008K init, 856K bss, 3358660K reserved, 0K cma-reserved) [ 0.213438] SLUB: HWalign=256, Order=0-3, MinObjects=0, CPUs=340, Nodes=1 [ 0.213476] ftrace: allocating 31392 entries in 123 pages [ 0.218045] ftrace: allocated 123 pages with 6 groups [ 0.219167] rcu: Hierarchical RCU implementation. [ 0.219167] rcu: RCU event tracing is enabled. [ 0.219168] rcu: RCU restricting CPUs from NR_CPUS=512 to nr_cpu_ids=340. [ 0.219169] Trampoline variant of Tasks RCU enabled. [ 0.219169] Rude variant of Tasks RCU enabled. [ 0.219169] Tracing variant of Tasks RCU enabled. [ 0.219170] rcu: RCU calculated value of scheduler-enlistment delay is 11 jiffies. [ 0.219171] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=340 [ 0.222162] NR_IRQS: 3, nr_irqs: 3, preallocated irqs: 3 [ 0.222313] clocksource: tod: mask: 0xffffffffffffffff max_cycles: 0x3b0a9be803b0a9, max_idle_ns: 1805497147909793 ns [ 0.222534] Console: colour dummy device 80x25 [ 0.323253] printk: console [ttyS0] enabled [ 0.457563] Calibrating delay loop (skipped)... 21881.00 BogoMIPS preset [ 0.457564] pid_max: default: 348160 minimum: 2720 [ 0.457736] LSM: Security Framework initializing [ 0.457767] SELinux: Initializing. [ 0.458030] Mount-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear) [ 0.458174] Mountpoint-cache hash table entries: 131072 (order: 8, 1048576 bytes, linear) [ 0.459174] rcu: Hierarchical SRCU implementation. [ 0.462173] smp: Bringing up secondary CPUs ... [ 0.485719] smp: Brought up 1 node, 64 CPUs [ 1.072386] random: fast init done [ 1.413328] node 0 deferred pages initialised in 930ms [ 1.442064] devtmpfs: initialized [ 1.442586] random: get_random_u32 called from bucket_table_alloc.isra.0+0x82/0x120 with crng_init=1 [ 1.442863] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns [ 1.443545] futex hash table entries: 131072 (order: 13, 33554432 bytes, vmalloc) [ 1.447717] xor: automatically using best checksumming function xc [ 1.447917] NET: Registered protocol family 16 [ 1.447956] audit: initializing netlink subsys (disabled) [ 1.448023] audit: type=2000 audit(1593597944.331:1): state=initialized audit_enabled=0 res=1 [ 1.448147] Spectre V2 mitigation: etokens [ 1.456276] HugeTLB registered 1.00 MiB page size, pre-allocated 0 pages [ 1.702379] raid6: vx128x8 gen() 21910 MB/s [ 1.872319] raid6: vx128x8 xor() 13356 MB/s [ 1.872320] raid6: using algorithm vx128x8 gen() 21910 MB/s [ 1.872321] raid6: .... xor() 13356 MB/s, rmw enabled [ 1.872322] raid6: using s390xc recovery algorithm [ 1.872601] iommu: Default domain type: Translated [ 1.872748] SCSI subsystem initialized [ 1.894528] PCI host bridge to bus 0000:00 [ 1.894533] pci_bus 0000:00: root bus resource [bus 00] [ 1.894535] pci_bus 0000:00: root bus resource [mem 0x8000000000000000-0x8000000007ffffff 64bit pref] [ 1.894600] pci 0000:00:00.0: [1014:044b] type 00 class 0x120000 [ 1.894650] pci 0000:00:00.0: reg 0x10: [mem 0xffffd80008000000-0xffffd8000fffffff 64bit pref] [ 1.894918] pci 0000:00:00.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown x0 link at 0000:00:00.0 (capable of 32.000 Gb/s with 5.0 GT/s PCIe x8 link) [ 1.894951] pci 0000:00:00.0: Adding to iommu group 0 [ 1.896880] PCI host bridge to bus 0001:00 [ 1.896882] pci_bus 0001:00: root bus resource [bus 00] [ 1.896883] pci_bus 0001:00: root bus resource [mem 0x8001000000000000-0x80010000000fffff 64bit pref] [ 1.896971] pci 0001:00:00.0: [15b3:1016] type 00 class 0x020000 [ 1.897066] pci 0001:00:00.0: reg 0x10: [mem 0xffffd40002000000-0xffffd400020fffff 64bit pref] [ 1.897216] pci 0001:00:00.0: enabling Extended Tags [ 1.897702] pci 0001:00:00.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown x0 link at 0001:00:00.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link) [ 1.897740] pci 0001:00:00.0: Adding to iommu group 1 [ 1.899688] PCI host bridge to bus 0002:00 [ 1.899690] pci_bus 0002:00: root bus resource [bus 00] [ 1.899692] pci_bus 0002:00: root bus resource [mem 0x8002000000000000-0x80020000000fffff 64bit pref] [ 1.899779] pci 0002:00:00.0: [15b3:1016] type 00 class 0x020000 [ 1.899877] pci 0002:00:00.0: reg 0x10: [mem 0xffffd40008000000-0xffffd400080fffff 64bit pref] [ 1.900031] pci 0002:00:00.0: enabling Extended Tags [ 1.900530] pci 0002:00:00.0: 0.000 Gb/s available PCIe bandwidth, limited by Unknown x0 link at 0002:00:00.0 (capable of 63.008 Gb/s with 8.0 GT/s PCIe x8 link) [ 1.900563] pci 0002:00:00.0: Adding to iommu group 2 [ 2.424592] VFS: Disk quotas dquot_6.6.0 [ 2.424648] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 2.425726] random: crng init done [ 2.426438] NET: Registered protocol family 2 [ 2.427232] tcp_listen_portaddr_hash hash table entries: 65536 (order: 8, 1048576 bytes, linear) [ 2.427784] TCP established hash table entries: 524288 (order: 10, 4194304 bytes, vmalloc) [ 2.429806] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes, linear) [ 2.430265] TCP: Hash tables configured (established 524288 bind 65536) [ 2.430641] UDP hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc) [ 2.431636] UDP-Lite hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc) [ 2.433139] NET: Registered protocol family 1 [ 2.433180] Trying to unpack rootfs image as initramfs... [ 3.042521] Freeing initrd memory: 46272K [ 3.042920] alg: No test for crc32be (crc32be-vx) [ 3.045455] Initialise system trusted keyrings [ 3.045514] workingset: timestamp_bits=45 max_order=25 bucket_order=0 [ 3.046303] zbud: loaded [ 3.046685] fuse: init (API version 7.31) [ 3.046739] SGI XFS with ACLs, security attributes, realtime, quota, no debug enabled [ 3.049940] Key type asymmetric registered [ 3.049942] Asymmetric key parser 'x509' registered [ 3.049948] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 251) [ 3.050150] io scheduler mq-deadline registered [ 3.050152] io scheduler kyber registered [ 3.050171] io scheduler bfq registered [ 3.051187] atomic64_test: passed [ 3.051236] hvc_iucv: The z/VM IUCV HVC device driver cannot be used without z/VM [ 3.055982] brd: module loaded [ 3.056482] NET: Registered protocol family 10 [ 3.057354] Segment Routing with IPv6 [ 3.057366] NET: Registered protocol family 17 [ 3.057370] Key type dns_resolver registered [ 3.057524] cio: Channel measurement facility initialized using format extended (mode autodetected) [ 3.057802] Discipline DIAG cannot be used without z/VM [ 5.446198] sclp_sd: No data is available for the config data entity [ 5.954014] qeth: loading core functions [ 5.954063] qeth: register layer 2 discipline [ 5.954064] qeth: register layer 3 discipline [ 5.954126] registered taskstats version 1 [ 5.954131] Loading compiled-in X.509 certificates [ 5.993277] Loaded X.509 cert 'Build time autogenerated kernel key: c46ba92ee388c82c5891ee836c9c20b752cdfac5' [ 5.995428] zswap: loaded using pool lzo/zbud [ 5.996258] Key type ._fscrypt registered [ 5.996258] Key type .fscrypt registered [ 5.996259] Key type fscrypt-provisioning registered [ 5.996570] Btrfs loaded, crc32c=crc32c-vx [ 5.996583] ima: No TPM chip found, activating TPM-bypass! [ 5.996586] ima: Allocated hash algorithm: sha256 [ 5.996596] ima: No architecture policies found [ 5.997561] Freeing unused kernel memory: 4008K [ 6.042376] Write protected read-only-after-init data: 80k [ 6.042381] Run /init as init process [ 6.042383] with arguments: [ 6.042383] /init [ 6.042383] with environment: [ 6.042384] HOME=/ [ 6.042384] TERM=linux [ 6.042384] BOOT_IMAGE=0 [ 6.042385] crashkernel=1G [ 6.057157] systemd[1]: Inserted module 'autofs4' [ 6.058276] systemd[1]: systemd v243.8-1.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified) [ 6.059083] systemd[1]: Detected architecture s390x. [ 6.059085] systemd[1]: Running in initial RAM disk. [ 6.059147] systemd[1]: Set hostname to <m83lp52.lnxne.boe>. [ 6.098479] systemd[1]: Reached target Local File Systems. [ 6.098536] systemd[1]: Reached target Slices. [ 6.098555] systemd[1]: Reached target Swap. [ 6.098570] systemd[1]: Reached target Timers. [ 6.098663] systemd[1]: Listening on Journal Audit Socket. [ 6.098712] systemd[1]: Listening on Journal Socket (/dev/log). [ 6.145682] umh: sub_info->path: /sbin/modprobe [ 6.145683] /sbin/modprobe [ 6.145684] -q [ 6.145685] -- [ 6.145685] sch_fq_codel [ 6.146408] == ret: 100 [ 6.146410] == KWIFEXITED(ret): 01 [ 6.146411] KWEXITSTATUS(ret): 1 [ 6.365672] audit: type=1130 audit(1593597949.251:2): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-journald comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 6.372997] audit: type=1130 audit(1593597949.261:3): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 7.294273] audit: type=1130 audit(1593597950.181:4): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-cmdline comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 7.310777] audit: type=1130 audit(1593597950.191:5): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-pre-udev comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 7.311257] audit: type=1334 audit(1593597950.191:6): prog-id=6 op=LOAD [ 7.311287] audit: type=1334 audit(1593597950.191:7): prog-id=7 op=LOAD [ 7.533584] audit: type=1130 audit(1593597950.421:8): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udevd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 7.538462] qeth 0.0.bd00: portname is deprecated and is ignored [ 7.539958] dasd-eckd 0.0.3318: A channel path to the device has become operational [ 7.540242] dasd-eckd 0.0.3319: A channel path to the device has become operational [ 7.540743] dasd-eckd 0.0.331a: A channel path to the device has become operational [ 7.540985] dasd-eckd 0.0.331b: A channel path to the device has become operational [ 7.543658] qdio: 0.0.bd02 OSA on SC 159b using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W AP [ 7.550125] dasd-eckd 0.0.3319: New DASD 3390/0E (CU 3990/01) with 262668 cylinders, 15 heads, 224 sectors [ 7.552945] dasd-eckd 0.0.3319: DASD with 4 KB/block, 189120960 KB total size, 48 KB/track, compatible disk layout [ 7.554058] dasdb:VOL1/ 0X3319: dasdb1 [ 7.554603] dasd-eckd 0.0.3318: New DASD 3390/0E (CU 3990/01) with 262668 cylinders, 15 heads, 224 sectors [ 7.557378] dasd-eckd 0.0.3318: DASD with 4 KB/block, 189120960 KB total size, 48 KB/track, compatible disk layout [ 7.558349] dasda:VOL1/ 0X3318: dasda1 [ 7.558949] dasd-eckd 0.0.331a: New DASD 3390/0C (CU 3990/01) with 30051 cylinders, 15 heads, 224 sectors [ 7.561612] dasd-eckd 0.0.331a: DASD with 4 KB/block, 21636720 KB total size, 48 KB/track, compatible disk layout [ 7.563390] dasd-eckd 0.0.331b: New DASD 3390/0C (CU 3990/01) with 30051 cylinders, 15 heads, 224 sectors [ 7.563427] dasdc:VOL1/ 0X3333: dasdc1 dasdc2 dasdc3 [ 7.566603] dasd-eckd 0.0.331b: DASD with 4 KB/block, 21636720 KB total size, 48 KB/track, compatible disk layout [ 7.567763] dasdd:VOL1/ 0X3333: dasdd1 dasdd2 dasdd3 [ 7.575390] qeth 0.0.bd00: QDIO data connection isolation is deactivated [ 7.575863] qeth 0.0.bd00: The device represents a Bridge Capable Port [ 7.576279] audit: type=1130 audit(1593597950.461:9): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-udev-trigger comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 7.579230] qeth 0.0.bd00: MAC address b6:c4:0f:21:ab:4e successfully registered [ 7.579749] qeth 0.0.bd00: Device is a OSD Express card (level: 0202) with link type OSD_10GIG. [ 7.590981] audit: type=1130 audit(1593597950.471:10): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=plymouth-start comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 7.621333] qeth 0.0.bd00: MAC address de:45:d7:61:c4:13 successfully registered [ 7.623582] qeth 0.0.bd00 encbd00: renamed from eth0 [ 7.812522] mlx5_core 0001:00:00.0: enabling device (0000 -> 0002) [ 7.812608] mlx5_core 0001:00:00.0: firmware version: 14.23.1020 [ 8.035163] dasdconf.sh Warning: 0.0.3319 is already online, not configuring [ 8.037162] dasdconf.sh Warning: 0.0.331b is already online, not configuring [ 8.037885] dasdconf.sh Warning: 0.0.3318 is already online, not configuring [ 8.065688] dasdconf.sh Warning: 0.0.331a is already online, not configuring [ 8.254855] umh: sub_info->path: /sbin/modprobe [ 8.254857] /sbin/modprobe [ 8.254858] -q [ 8.254858] -- [ 8.254859] mlx5_ib [ 8.255376] mlx5_core 0002:00:00.0: enabling device (0000 -> 0002) [ 8.255460] mlx5_core 0002:00:00.0: firmware version: 14.23.1020 [ 8.706511] umh: sub_info->path: /sbin/modprobe [ 8.706542] /sbin/modprobe [ 8.706546] -q [ 8.706555] -- [ 8.706563] mlx5_ib [ 8.708963] mlx5_core 0001:00:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0) [ 8.853364] mlx5_core 0002:00:00.0: MLX5E: StrdRq(0) RqSz(1024) StrdSz(256) RxCqeCmprss(0) [ 9.023031] mlx5_core 0002:00:00.0 enP2s564np0: renamed from eth1 [ 9.223067] mlx5_core 0001:00:00.0 enP1s519np0: renamed from eth0 [ 9.553829] audit: type=1130 audit(1593597952.441:11): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=dracut-initqueue comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 9.570174] audit: type=1130 audit(1593597952.451:12): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=kernel msg='unit=systemd-fsck-root comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 9.578552] EXT4-fs (dasda1): mounted filesystem with ordered data mode. Opts: (null) [ 9.586599] audit: type=1334 audit(1593597952.471:13): prog-id=5 op=UNLOAD [ 9.937200] systemd-journald[484]: Received SIGTERM from PID 1 (systemd). [ 9.954691] printk: systemd: 19 output lines suppressed due to ratelimiting [ 10.278153] SELinux: Permission watch in class filesystem not defined in policy. [ 10.278160] SELinux: Permission watch in class file not defined in policy. [ 10.278161] SELinux: Permission watch_mount in class file not defined in policy. [ 10.278162] SELinux: Permission watch_sb in class file not defined in policy. [ 10.278163] SELinux: Permission watch_with_perm in class file not defined in policy. [ 10.278164] SELinux: Permission watch_reads in class file not defined in policy. [ 10.278167] SELinux: Permission watch in class dir not defined in policy. [ 10.278168] SELinux: Permission watch_mount in class dir not defined in policy. [ 10.278169] SELinux: Permission watch_sb in class dir not defined in policy. [ 10.278170] SELinux: Permission watch_with_perm in class dir not defined in policy. [ 10.278171] SELinux: Permission watch_reads in class dir not defined in policy. [ 10.278174] SELinux: Permission watch in class lnk_file not defined in policy. [ 10.278175] SELinux: Permission watch_mount in class lnk_file not defined in policy. [ 10.278176] SELinux: Permission watch_sb in class lnk_file not defined in policy. [ 10.278177] SELinux: Permission watch_with_perm in class lnk_file not defined in policy. [ 10.278178] SELinux: Permission watch_reads in class lnk_file not defined in policy. [ 10.278180] SELinux: Permission watch in class chr_file not defined in policy. [ 10.278198] SELinux: Permission watch_mount in class chr_file not defined in policy. [ 10.278199] SELinux: Permission watch_sb in class chr_file not defined in policy. [ 10.278200] SELinux: Permission watch_with_perm in class chr_file not defined in policy. [ 10.278201] SELinux: Permission watch_reads in class chr_file not defined in policy. [ 10.278203] SELinux: Permission watch in class blk_file not defined in policy. [ 10.278204] SELinux: Permission watch_mount in class blk_file not defined in policy. [ 10.278205] SELinux: Permission watch_sb in class blk_file not defined in policy. [ 10.278206] SELinux: Permission watch_with_perm in class blk_file not defined in policy. [ 10.278207] SELinux: Permission watch_reads in class blk_file not defined in policy. [ 10.278209] SELinux: Permission watch in class sock_file not defined in policy. [ 10.278211] SELinux: Permission watch_mount in class sock_file not defined in policy. [ 10.278212] SELinux: Permission watch_sb in class sock_file not defined in policy. [ 10.278212] SELinux: Permission watch_with_perm in class sock_file not defined in policy. [ 10.278213] SELinux: Permission watch_reads in class sock_file not defined in policy. [ 10.278216] SELinux: Permission watch in class fifo_file not defined in policy. [ 10.278217] SELinux: Permission watch_mount in class fifo_file not defined in policy. [ 10.278218] SELinux: Permission watch_sb in class fifo_file not defined in policy. [ 10.278219] SELinux: Permission watch_with_perm in class fifo_file not defined in policy. [ 10.278220] SELinux: Permission watch_reads in class fifo_file not defined in policy. [ 10.278262] SELinux: Permission perfmon in class capability2 not defined in policy. [ 10.278263] SELinux: Permission bpf in class capability2 not defined in policy. [ 10.278268] SELinux: Permission perfmon in class cap2_userns not defined in policy. [ 10.278269] SELinux: Permission bpf in class cap2_userns not defined in policy. [ 10.278304] SELinux: Class perf_event not defined in policy. [ 10.278305] SELinux: Class lockdown not defined in policy. [ 10.278306] SELinux: the above unknown classes and permissions will be allowed [ 10.278311] SELinux: policy capability network_peer_controls=1 [ 10.278311] SELinux: policy capability open_perms=1 [ 10.278312] SELinux: policy capability extended_socket_class=1 [ 10.278313] SELinux: policy capability always_check_network=0 [ 10.278314] SELinux: policy capability cgroup_seclabel=1 [ 10.278314] SELinux: policy capability nnp_nosuid_transition=1 [ 10.278315] SELinux: policy capability genfs_seclabel_symlinks=0 [ 10.389095] systemd[1]: Successfully loaded SELinux policy in 339.175ms. [ 10.534855] systemd[1]: Relabelled /dev, /dev/shm, /run, /sys/fs/cgroup in 13.169ms. [ 10.537104] systemd[1]: systemd v243.8-1.fc31 running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=unified) [ 10.537930] systemd[1]: Detected architecture s390x. [ 10.540018] systemd[1]: Set hostname to <m83lp52.lnxne.boe>. [ 10.716298] systemd[1]: /usr/lib/systemd/system/sssd.service:12: PIDFile= references a path below legacy directory /var/run/, updating /var/run/sssd.pid → /run/sssd.pid; please update the unit file accordingly. [ 10.721240] systemd[1]: /usr/lib/systemd/system/iscsid.service:11: PIDFile= references a path below legacy directory /var/run/, updating /var/run/iscsid.pid → /run/iscsid.pid; please update the unit file accordingly. [ 10.721443] systemd[1]: /usr/lib/systemd/system/iscsiuio.service:13: PIDFile= references a path below legacy directory /var/run/, updating /var/run/iscsiuio.pid → /run/iscsiuio.pid; please update the unit file accordingly. [ 10.748479] systemd[1]: /usr/lib/systemd/system/sssd-kcm.socket:7: ListenStream= references a path below legacy directory /var/run/, updating /var/run/.heim_org.h5l.kcm-socket → /run/.heim_org.h5l.kcm-socket; please update the unit file accordingly. [ 10.781859] systemd[1]: initrd-switch-root.service: Succeeded. [ 10.781985] systemd[1]: Stopped Switch Root. [ 10.782251] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 1. [ 10.810779] EXT4-fs (dasda1): re-mounted. Opts: (null) [ 10.841271] umh: sub_info->path: /sbin/modprobe [ 10.841275] /sbin/modprobe [ 10.841275] -q [ 10.841276] -- [ 10.841277] sch_fq_codel [ 10.889454] == ret: 00 [ 10.889456] == KWIFEXITED(ret): 01 [ 10.889457] KWEXITSTATUS(ret): 0 [ 11.118962] systemd-journald[990]: Received client request to flush runtime journal. [ 11.392641] VFIO - User Level meta-driver version: 0.3 [ 11.490782] genwqe 0000:00:00.0: enabling device (0000 -> 0002) [ 11.506302] dasdconf.sh Warning: 0.0.3319 is already online, not configuring [ 11.507431] dasdconf.sh Warning: 0.0.331a is already online, not configuring [ 11.507992] dasdconf.sh Warning: 0.0.331b is already online, not configuring [ 11.508762] dasdconf.sh Warning: 0.0.3318 is already online, not configuring [ 11.625686] umh: sub_info->path: /sbin/modprobe [ 11.625690] /sbin/modprobe [ 11.625690] -q [ 11.625691] -- [ 11.625692] crypto-ecb(aes) [ 11.627270] == ret: 100 [ 11.627272] == KWIFEXITED(ret): 01 [ 11.627274] KWEXITSTATUS(ret): 1 [ 11.627678] umh: sub_info->path: /sbin/modprobe [ 11.627681] /sbin/modprobe [ 11.627682] -q [ 11.627682] -- [ 11.627683] crypto-cbc(aes) [ 11.628695] == ret: 100 [ 11.628699] == KWIFEXITED(ret): 01 [ 11.628699] KWEXITSTATUS(ret): 1 [ 11.629108] umh: sub_info->path: /sbin/modprobe [ 11.629110] /sbin/modprobe [ 11.629110] -q [ 11.629111] -- [ 11.629112] crypto-xts(aes) [ 11.630032] == ret: 100 [ 11.630034] == KWIFEXITED(ret): 01 [ 11.630035] KWEXITSTATUS(ret): 1 [ 11.630098] umh: sub_info->path: /sbin/modprobe [ 11.630100] /sbin/modprobe [ 11.630100] -q [ 11.630101] -- [ 11.630102] crypto-aes [ 11.630979] == ret: 100 [ 11.630980] == KWIFEXITED(ret): 01 [ 11.631000] KWEXITSTATUS(ret): 1 [ 11.631010] umh: sub_info->path: /sbin/modprobe [ 11.631012] /sbin/modprobe [ 11.631012] -q [ 11.631012] -- [ 11.631013] cryptomgr [ 11.631844] == ret: 00 [ 11.631845] == KWIFEXITED(ret): 01 [ 11.631846] KWEXITSTATUS(ret): 0 [ 11.632291] umh: sub_info->path: /sbin/modprobe [ 11.632293] /sbin/modprobe [ 11.632293] -q [ 11.632294] -- [ 11.632295] crypto-ctr(aes) [ 11.633134] == ret: 100 [ 11.633136] == KWIFEXITED(ret): 01 [ 11.633136] KWEXITSTATUS(ret): 1 [ 11.633211] umh: sub_info->path: /sbin/modprobe [ 11.633213] /sbin/modprobe [ 11.633213] -q [ 11.633214] -- [ 11.633215] crypto-ctr [ 11.639862] == ret: 00 [ 11.639863] == KWIFEXITED(ret): 01 [ 11.639864] KWEXITSTATUS(ret): 0 [ 12.291736] device-mapper: uevent: version 1.0.3 [ 12.291869] device-mapper: ioctl: 4.42.0-ioctl (2020-02-27) initialised: dm-devel@redhat.com [ 12.811519] kauditd_printk_skb: 81 callbacks suppressed [ 12.811520] audit: type=1130 audit(1593597955.691:93): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dmraid-activation comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 12.811526] audit: type=1131 audit(1593597955.691:94): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dmraid-activation comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 12.827013] audit: type=1130 audit(1593597955.711:95): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-fsck@dev-disk-by\x2dpath-ccw\x2d0.0.3319\x2dpart1 comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 12.832192] XFS (dasdb1): Mounting V5 Filesystem [ 12.842643] RPC: Registered named UNIX socket transport module. [ 12.842645] RPC: Registered udp transport module. [ 12.842646] RPC: Registered tcp transport module. [ 12.842647] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 12.855633] XFS (dasdb1): Ending clean mount [ 12.857782] xfs filesystem being mounted at /home supports timestamps until 2038 (0x7fffffff) [ 12.876314] audit: type=1130 audit(1593597955.761:96): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dracut-shutdown comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 12.896111] audit: type=1130 audit(1593597955.781:97): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=plymouth-read-write comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 12.896302] audit: type=1130 audit(1593597955.781:98): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=import-state comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 12.899463] RPC: Registered rdma transport module. [ 12.899466] RPC: Registered rdma backchannel transport module. [ 12.900337] audit: type=1130 audit(1593597955.781:99): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rdma-load-modules@rdma comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 12.916905] audit: type=1130 audit(1593597955.801:100): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=rdma comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 12.952843] audit: type=1130 audit(1593597955.841:101): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-tmpfiles-setup comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' [ 12.962072] audit: type=1305 audit(1593597955.841:102): op=set audit_enabled=1 old=1 auid=4294967295 ses=4294967295 subj=system_u:system_r:auditd_t:s0 res=1 [ 13.444427] umh: sub_info->path: /sbin/modprobe [ 13.444430] /sbin/modprobe [ 13.444430] -q [ 13.444431] -- [ 13.444432] iptable_filter [ 13.485947] == ret: 00 [ 13.485949] == KWIFEXITED(ret): 01 [ 13.485950] KWEXITSTATUS(ret): 0 [ 13.525155] umh: sub_info->path: /sbin/modprobe [ 13.525156] /sbin/modprobe [ 13.525157] -q [ 13.525157] -- [ 13.525158] ip6table_filter [ 13.575540] == ret: 00 [ 13.575542] == KWIFEXITED(ret): 01 [ 13.575543] KWEXITSTATUS(ret): 0 [ 13.590423] umh: sub_info->path: /sbin/modprobe [ 13.590425] /sbin/modprobe [ 13.590425] -q [ 13.590426] -- [ 13.590427] net-pf-16-proto-12 [ 13.617106] == ret: 00 [ 13.617108] == KWIFEXITED(ret): 01 [ 13.617109] KWEXITSTATUS(ret): 0 [ 13.617147] umh: sub_info->path: /sbin/modprobe [ 13.617148] /sbin/modprobe [ 13.617149] -q [ 13.617149] -- [ 13.617150] nfnetlink-subsys-6 [ 13.638728] == ret: 00 [ 13.638729] == KWIFEXITED(ret): 01 [ 13.638730] KWEXITSTATUS(ret): 0 [ 13.831375] umh: sub_info->path: /sbin/modprobe [ 13.831377] /sbin/modprobe [ 13.831378] -q [ 13.831379] -- [ 13.831379] iptable_security [ 13.846372] == ret: 00 [ 13.846374] == KWIFEXITED(ret): 01 [ 13.846375] KWEXITSTATUS(ret): 0 [ 13.847932] umh: sub_info->path: /sbin/modprobe [ 13.847934] /sbin/modprobe [ 13.847934] -q [ 13.847935] -- [ 13.847935] iptable_raw [ 13.854549] == ret: 00 [ 13.854550] == KWIFEXITED(ret): 01 [ 13.854551] KWEXITSTATUS(ret): 0 [ 13.855856] umh: sub_info->path: /sbin/modprobe [ 13.855857] /sbin/modprobe [ 13.855857] -q [ 13.855858] -- [ 13.855858] iptable_mangle [ 13.860364] == ret: 00 [ 13.860365] == KWIFEXITED(ret): 01 [ 13.860366] KWEXITSTATUS(ret): 0 [ 13.861761] umh: sub_info->path: /sbin/modprobe [ 13.861762] /sbin/modprobe [ 13.861763] -q [ 13.861763] -- [ 13.861764] iptable_nat [ 13.911825] == ret: 00 [ 13.911826] == KWIFEXITED(ret): 01 [ 13.911827] KWEXITSTATUS(ret): 0 [ 13.914686] umh: sub_info->path: /sbin/modprobe [ 13.914688] /sbin/modprobe [ 13.914688] -q [ 13.914689] -- [ 13.914689] ip6table_security [ 13.945628] == ret: 00 [ 13.945630] == KWIFEXITED(ret): 01 [ 13.945630] KWEXITSTATUS(ret): 0 [ 13.947019] umh: sub_info->path: /sbin/modprobe [ 13.947020] /sbin/modprobe [ 13.947020] -q [ 13.947021] -- [ 13.947022] ip6table_raw [ 13.969857] == ret: 00 [ 13.969859] == KWIFEXITED(ret): 01 [ 13.969860] KWEXITSTATUS(ret): 0 [ 13.971549] umh: sub_info->path: /sbin/modprobe [ 13.971551] /sbin/modprobe [ 13.971552] -q [ 13.971553] -- [ 13.971553] ip6table_mangle [ 13.977724] == ret: 00 [ 13.977725] == KWIFEXITED(ret): 01 [ 13.977726] KWEXITSTATUS(ret): 0 [ 13.979392] umh: sub_info->path: /sbin/modprobe [ 13.979393] /sbin/modprobe [ 13.979394] -q [ 13.979394] -- [ 13.979395] ip6table_nat [ 13.985103] == ret: 00 [ 13.985104] == KWIFEXITED(ret): 01 [ 13.985105] KWEXITSTATUS(ret): 0 [ 14.014760] umh: sub_info->path: /sbin/modprobe [ 14.014762] /sbin/modprobe [ 14.014762] -q [ 14.014762] -- [ 14.014763] ipt_conntrack [ 14.039306] mlx5_core 0001:00:00.0 enP1s519np0: Link up [ 14.041835] IPv6: ADDRCONF(NETDEV_CHANGE): enP1s519np0: link becomes ready [ 14.047723] == ret: 00 [ 14.047725] == KWIFEXITED(ret): 01 [ 14.047726] KWEXITSTATUS(ret): 0 [ 14.048012] umh: sub_info->path: /sbin/modprobe [ 14.048014] /sbin/modprobe [ 14.048014] -q [ 14.048015] -- [ 14.048015] ipt_REJECT [ 14.073846] == ret: 00 [ 14.073848] == KWIFEXITED(ret): 01 [ 14.073848] KWEXITSTATUS(ret): 0 [ 14.077239] umh: sub_info->path: /sbin/modprobe [ 14.077241] /sbin/modprobe [ 14.077241] -q [ 14.077242] -- [ 14.077242] ip6t_rpfilter [ 14.099370] == ret: 00 [ 14.099372] == KWIFEXITED(ret): 01 [ 14.099373] KWEXITSTATUS(ret): 0 [ 14.100300] umh: sub_info->path: /sbin/modprobe [ 14.100301] /sbin/modprobe [ 14.100302] -q [ 14.100302] -- [ 14.100303] ip6t_REJECT [ 14.122176] == ret: 00 [ 14.122177] == KWIFEXITED(ret): 01 [ 14.122178] KWEXITSTATUS(ret): 0 [ 14.128047] umh: sub_info->path: /sbin/modprobe [ 14.128048] /sbin/modprobe [ 14.128048] -q [ 14.128049] -- [ 14.128049] ip6t_tcp [ 14.149302] mlx5_core 0002:00:00.0 enP2s564np0: Link up [ 14.173411] == ret: 00 [ 14.173413] == KWIFEXITED(ret): 01 [ 14.173414] KWEXITSTATUS(ret): 0 [ 14.322406] umh: sub_info->path: /sbin/modprobe [ 14.322409] /sbin/modprobe [ 14.322410] -q [ 14.322410] -- [ 14.322411] rtnl-link-bridge [ 14.389015] == ret: 00 [ 14.389017] == KWIFEXITED(ret): 01 [ 14.389018] KWEXITSTATUS(ret): 0 [ 14.389335] umh: sub_info->path: /sbin/modprobe [ 14.389336] /sbin/modprobe [ 14.389336] -q [ 14.389337] -- [ 14.389338] char-major-10-200 [ 14.437420] tun: Universal TUN/TAP device driver, 1.6 [ 14.437604] == ret: 00 [ 14.437606] == KWIFEXITED(ret): 01 [ 14.437606] KWEXITSTATUS(ret): 0 [ 14.438482] virbr0: port 1(virbr0-nic) entered blocking state [ 14.438485] virbr0: port 1(virbr0-nic) entered disabled state [ 14.438635] device virbr0-nic entered promiscuous mode [ 14.439654] umh: sub_info->path: /sbin/bridge-stp [ 14.439656] /sbin/bridge-stp [ 14.439656] virbr0 [ 14.439656] start [ 14.439734] == ret: 00 [ 14.439735] == KWIFEXITED(ret): 01 [ 14.439736] KWEXITSTATUS(ret): 0 [ 14.567072] umh: sub_info->path: /sbin/modprobe [ 14.567074] /sbin/modprobe [ 14.567075] -q [ 14.567075] -- [ 14.567076] ipt_CT [ 14.576352] == ret: 00 [ 14.576353] == KWIFEXITED(ret): 01 [ 14.576354] KWEXITSTATUS(ret): 0 [ 14.576443] umh: sub_info->path: /sbin/modprobe [ 14.576444] /sbin/modprobe [ 14.576444] -q [ 14.576445] -- [ 14.576445] nfct-helper-tftp [ 14.605182] == ret: 00 [ 14.605183] == KWIFEXITED(ret): 01 [ 14.605184] KWEXITSTATUS(ret): 0 [ 14.675162] umh: sub_info->path: /sbin/modprobe [ 14.675164] /sbin/modprobe [ 14.675164] -q [ 14.675165] -- [ 14.675165] ipt_MASQUERADE [ 14.684223] == ret: 00 [ 14.684224] == KWIFEXITED(ret): 01 [ 14.684225] KWEXITSTATUS(ret): 0 [ 14.701460] umh: sub_info->path: /sbin/modprobe [ 14.701462] /sbin/modprobe [ 14.701462] -q [ 14.701462] -- [ 14.701463] ipt_CHECKSUM [ 14.707019] == ret: 00 [ 14.707021] == KWIFEXITED(ret): 01 [ 14.707022] KWEXITSTATUS(ret): 0 [ 14.707948] virbr0: port 1(virbr0-nic) entered blocking state [ 14.731752] virbr0: port 1(virbr0-nic) entered disabled state [ 14.981871] umh: sub_info->path: /sbin/modprobe [ 14.981873] /sbin/modprobe [ 14.981873] -q [ 14.981874] -- [ 14.981874] char-major-10-232 [ 15.014374] == ret: 00 [ 15.014375] == KWIFEXITED(ret): 01 [ 15.014376] KWEXITSTATUS(ret): 0 [ 15.102392] IPv6: ADDRCONF(NETDEV_CHANGE): enP2s564np0: link becomes ready [ 20.341424] virbr0: port 2(vnet0) entered blocking state [ 20.341429] virbr0: port 2(vnet0) entered disabled state [ 20.341514] device vnet0 entered promiscuous mode [ 20.341714] virbr0: port 2(vnet0) entered blocking state [ 20.341988] umh: sub_info->path: /sbin/modprobe [ 20.341989] /sbin/modprobe [ 20.341989] -q [ 20.341990] -- [ 20.341991] char-major-10-238 [ 20.408823] == ret: 00 [ 20.408825] == KWIFEXITED(ret): 01 [ 20.408826] KWEXITSTATUS(ret): 0
On 2020/07/01 19:08, Christian Borntraeger wrote: > > > On 30.06.20 19:57, Luis Chamberlain wrote: >> On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote: >>> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote: >>>> >>>> >>>> On 24.06.20 20:32, Christian Borntraeger wrote: >>>> [...]> >>>>> So the translations look correct. But your change is actually a sematic change >>>>> if(ret) will only trigger if there is an error >>>>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD >>>>> and we did not do it before. >>>>> >>>> >>>> So the right fix is >>>> >>>> diff --git a/kernel/umh.c b/kernel/umh.c >>>> index f81e8698e36e..a3a3196e84d1 100644 >>>> --- a/kernel/umh.c >>>> +++ b/kernel/umh.c >>>> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) >>>> * the real error code is already in sub_info->retval or >>>> * sub_info->retval is 0 anyway, so don't mess with it then. >>>> */ >>>> - if (KWIFEXITED(ret)) >>>> + if (KWEXITSTATUS(ret)) >>>> sub_info->retval = KWEXITSTATUS(ret); Well, it is not br_stp_call_user() but br_stp_start() which is expecting to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case. That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix processed error when UMH_WAIT_PROC is used" will be the correct behavior).
On Wed, Jul 01, 2020 at 12:08:11PM +0200, Christian Borntraeger wrote: > dmesg attached > [ 14.438482] virbr0: port 1(virbr0-nic) entered blocking state > [ 14.438485] virbr0: port 1(virbr0-nic) entered disabled state > [ 14.438635] device virbr0-nic entered promiscuous mode > [ 14.439654] umh: sub_info->path: /sbin/bridge-stp > [ 14.439656] /sbin/bridge-stp > [ 14.439656] virbr0 > [ 14.439656] start OK so what we seem to want to debug is the umh call for: /sbin/bridge-stp virbr0 start > [ 14.439734] == ret: 00 > [ 14.439735] == KWIFEXITED(ret): 01 > [ 14.439736] KWEXITSTATUS(ret): 0 Its not clear if this is the respective return value, but now that we have a clue that this is the the only non-modprobe call, we should have a clearer certainty that this is the issue. Indeed my patch "umh: fix processed error when UMH_WAIT_PROC is used" did modify bridge-stp in the following way: diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c index ba55851fe132..bdd94b45396b 100644 --- a/net/bridge/br_stp_if.c +++ b/net/bridge/br_stp_if.c @@ -133,14 +133,8 @@ static int br_stp_call_user(struct net_bridge *br, char *arg) /* call userspace STP and report program errors */ rc = call_usermodehelper(BR_STP_PROG, argv, envp, UMH_WAIT_PROC); - if (rc > 0) { - if (rc & 0xff) - br_debug(br, BR_STP_PROG " received signal %d\n", - rc & 0x7f); - else - br_debug(br, BR_STP_PROG " exited with code %d\n", - (rc >> 8) & 0xff); - } + if (rc != 0) + br_debug(br, BR_STP_PROG " failed with exit code %d\n", rc); return rc; } If you look at this carefully though you'll notice that the change just modifies *when* we issue the debug print. The more important relevant part of the patch however was that we now do return a correct error value when the call fails. More importantly, depending on if an error or not we run the kernel STP or userspace STP later: static void br_stp_start(struct net_bridge *br) { int err = -ENOENT; if (net_eq(dev_net(br->dev), &init_net)) err = br_stp_call_user(br, "start"); if (err && err != -ENOENT) br_err(br, "failed to start userspace STP (%d)\n", err); spin_lock_bh(&br->lock); if (br->bridge_forward_delay < BR_MIN_FORWARD_DELAY) __br_set_forward_delay(br, BR_MIN_FORWARD_DELAY); else if (br->bridge_forward_delay > BR_MAX_FORWARD_DELAY) __br_set_forward_delay(br, BR_MAX_FORWARD_DELAY); ---------------------> can you enable debug print for this to see what ---------------------> you end up using? if (!err) { br->stp_enabled = BR_USER_STP; br_debug(br, "userspace STP started\n"); } else { br->stp_enabled = BR_KERNEL_STP; br_debug(br, "using kernel STP\n"); /* To start timers on any ports left in blocking */ if (br->dev->flags & IFF_UP) mod_timer(&br->hello_timer, jiffies + br->hello_time); br_port_state_selection(br); } -----------------> spin_unlock_bh(&br->lock); }
On Wed, Jul 01, 2020 at 10:24:29PM +0900, Tetsuo Handa wrote: > On 2020/07/01 19:08, Christian Borntraeger wrote: > > > > > > On 30.06.20 19:57, Luis Chamberlain wrote: > >> On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote: > >>> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote: > >>>> > >>>> > >>>> On 24.06.20 20:32, Christian Borntraeger wrote: > >>>> [...]> > >>>>> So the translations look correct. But your change is actually a sematic change > >>>>> if(ret) will only trigger if there is an error > >>>>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD > >>>>> and we did not do it before. > >>>>> > >>>> > >>>> So the right fix is > >>>> > >>>> diff --git a/kernel/umh.c b/kernel/umh.c > >>>> index f81e8698e36e..a3a3196e84d1 100644 > >>>> --- a/kernel/umh.c > >>>> +++ b/kernel/umh.c > >>>> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) > >>>> * the real error code is already in sub_info->retval or > >>>> * sub_info->retval is 0 anyway, so don't mess with it then. > >>>> */ > >>>> - if (KWIFEXITED(ret)) > >>>> + if (KWEXITSTATUS(ret)) > >>>> sub_info->retval = KWEXITSTATUS(ret); > > Well, it is not br_stp_call_user() but br_stp_start() which is expecting > to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case. > That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix > processed error when UMH_WAIT_PROC is used" will be the correct behavior). br_stp_start() doesn't check for the raw value, it just checks for err or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is used" propagates the correct error now. Christian, can you try removing the binary temporarily and seeing if you get your bridge working? Luis
On 2020/07/01 22:53, Luis Chamberlain wrote: >> Well, it is not br_stp_call_user() but br_stp_start() which is expecting >> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case. >> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix >> processed error when UMH_WAIT_PROC is used" will be the correct behavior). > > br_stp_start() doesn't check for the raw value, it just checks for err > or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is > used" propagates the correct error now. No. If "/sbin/bridge-stp virbr0 start" terminated due to e.g. SIGSEGV (for example, by inserting "kill -SEGV $$" into right after "#!/bin/sh" line), br_stp_start() needs to select BR_KERNEL_STP path. We can't assume that /sbin/bridge-stp is always terminated by exit() syscall (and hence we can't ignore KWIFSIGNALED() case in call_usermodehelper_exec_sync()).
On 01.07.20 15:53, Luis Chamberlain wrote: > On Wed, Jul 01, 2020 at 10:24:29PM +0900, Tetsuo Handa wrote: >> On 2020/07/01 19:08, Christian Borntraeger wrote: >>> >>> >>> On 30.06.20 19:57, Luis Chamberlain wrote: >>>> On Fri, Jun 26, 2020 at 02:54:10AM +0000, Luis Chamberlain wrote: >>>>> On Wed, Jun 24, 2020 at 08:37:55PM +0200, Christian Borntraeger wrote: >>>>>> >>>>>> >>>>>> On 24.06.20 20:32, Christian Borntraeger wrote: >>>>>> [...]> >>>>>>> So the translations look correct. But your change is actually a sematic change >>>>>>> if(ret) will only trigger if there is an error >>>>>>> if (KWIFEXITED(ret)) will always trigger when the process ends. So we will always overwrite -ECHILD >>>>>>> and we did not do it before. >>>>>>> >>>>>> >>>>>> So the right fix is >>>>>> >>>>>> diff --git a/kernel/umh.c b/kernel/umh.c >>>>>> index f81e8698e36e..a3a3196e84d1 100644 >>>>>> --- a/kernel/umh.c >>>>>> +++ b/kernel/umh.c >>>>>> @@ -154,7 +154,7 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) >>>>>> * the real error code is already in sub_info->retval or >>>>>> * sub_info->retval is 0 anyway, so don't mess with it then. >>>>>> */ >>>>>> - if (KWIFEXITED(ret)) >>>>>> + if (KWEXITSTATUS(ret)) >>>>>> sub_info->retval = KWEXITSTATUS(ret); >> >> Well, it is not br_stp_call_user() but br_stp_start() which is expecting >> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case. >> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix >> processed error when UMH_WAIT_PROC is used" will be the correct behavior). > > br_stp_start() doesn't check for the raw value, it just checks for err > or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is > used" propagates the correct error now. > > Christian, can you try removing the binary temporarily and seeing if > you get your bridge working? As I matter of fact I do NOT have /sbin/bridge-stp installed.
On Wed, Jul 01, 2020 at 11:08:57PM +0900, Tetsuo Handa wrote: > On 2020/07/01 22:53, Luis Chamberlain wrote: > >> Well, it is not br_stp_call_user() but br_stp_start() which is expecting > >> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case. > >> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix > >> processed error when UMH_WAIT_PROC is used" will be the correct behavior). > > > > br_stp_start() doesn't check for the raw value, it just checks for err > > or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is > > used" propagates the correct error now. > > No. If "/sbin/bridge-stp virbr0 start" terminated due to e.g. SIGSEGV > (for example, by inserting "kill -SEGV $$" into right after "#!/bin/sh" line), > br_stp_start() needs to select BR_KERNEL_STP path. We can't assume that > /sbin/bridge-stp is always terminated by exit() syscall (and hence we can't > ignore KWIFSIGNALED() case in call_usermodehelper_exec_sync()). Ah, well that would be a different fix required, becuase again, br_stp_start() does not untangle the correct error today really. I also I think it would be odd odd that SIGSEGV or another signal is what was terminating Christian's bridge stp call, but let's find out! Note we pass 0 to the options to wait so the mistake here could indeed be that we did not need KWIFSIGNALED(). I was afraid of this prospect... as it other implications. It means we either *open code* all callers, or we handle this in a unified way on the umh. And if we do handle this in a unified way, it then begs the question as to *what* do we pass for the signals case and continued case. Below we just pass the signal, and treat continued as OK, but treating continued as OK would also be a *new* change as well. For instance (this goes just boot tested, but Christian if you can try this as well that would be appreciated): diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index bba06befbff5..d1898f5dd1fc 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -105,10 +105,12 @@ extern long kernel_wait4(pid_t, int __user *, int, struct rusage *); /* Only add helpers for actual use cases in the kernel */ #define KWEXITSTATUS(status) (__KWEXITSTATUS(status)) +#define KWTERMSIG(status) (__KWTERMSIG(status)) +#define KWSTOPSIG(status) (__KWSTOPSIG(status)) #define KWIFEXITED(status) (__KWIFEXITED(status)) - -/* Nonzero if STATUS indicates normal termination. */ -#define __KWIFEXITED(status) (__KWTERMSIG(status) == 0) +#define KWIFSIGNALED(status) (__KWIFSIGNALED(status)) +#define KWIFSTOPPED(status) (__KWIFSTOPPED(status)) +#define KWIFCONTINUED(status) (__KWIFCONTINUED(status)) /* If KWIFEXITED(STATUS), the low-order 8 bits of the status. */ #define __KWEXITSTATUS(status) (((status) & 0xff00) >> 8) @@ -116,6 +118,24 @@ extern long kernel_wait4(pid_t, int __user *, int, struct rusage *); /* If KWIFSIGNALED(STATUS), the terminating signal. */ #define __KWTERMSIG(status) ((status) & 0x7f) +/* If KWIFSTOPPED(STATUS), the signal that stopped the child. */ +#define __KWSTOPSIG(status) __KWEXITSTATUS(status) + +/* Nonzero if STATUS indicates normal termination. */ +#define __KWIFEXITED(status) (__KWTERMSIG(status) == 0) + +/* Nonzero if STATUS indicates termination by a signal. */ +#define __KWIFSIGNALED(status) \ + (((signed char) (((status) & 0x7f) + 1) >> 1) > 0) + +/* Nonzero if STATUS indicates the child is stopped. */ +#define __KWIFSTOPPED(status) (((status) & 0xff) == 0x7f) + +/* Nonzero if STATUS indicates the child continued after a stop. */ +#define __KWIFCONTINUED(status) ((status) == __KW_CONTINUED) + +#define __KW_CONTINUED 0xffff + extern void free_task(struct task_struct *tsk); /* sched_exec is called by processes performing an exec */ diff --git a/kernel/umh.c b/kernel/umh.c index f81e8698e36e..c98fb1ed90c9 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -156,6 +156,18 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) */ if (KWIFEXITED(ret)) sub_info->retval = KWEXITSTATUS(ret); + /* + * Do we really want to be passing the signal, or do we pass + * a single error code for all cases? + */ + else if (KWIFSIGNALED(ret)) + sub_info->retval = KWTERMSIG(ret); + /* Same here */ + else if (KWIFSTOPPED((ret))) + sub_info->retval = KWSTOPSIG(ret); + /* And are we really sure we want this? */ + else if (KWIFCONTINUED((ret))) + sub_info->retval = 0; } /* Restore default kernel sig handler */
On 01.07.20 17:38, Luis Chamberlain wrote: > On Wed, Jul 01, 2020 at 11:08:57PM +0900, Tetsuo Handa wrote: >> On 2020/07/01 22:53, Luis Chamberlain wrote: >>>> Well, it is not br_stp_call_user() but br_stp_start() which is expecting >>>> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case. >>>> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix >>>> processed error when UMH_WAIT_PROC is used" will be the correct behavior). >>> >>> br_stp_start() doesn't check for the raw value, it just checks for err >>> or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is >>> used" propagates the correct error now. >> >> No. If "/sbin/bridge-stp virbr0 start" terminated due to e.g. SIGSEGV >> (for example, by inserting "kill -SEGV $$" into right after "#!/bin/sh" line), >> br_stp_start() needs to select BR_KERNEL_STP path. We can't assume that >> /sbin/bridge-stp is always terminated by exit() syscall (and hence we can't >> ignore KWIFSIGNALED() case in call_usermodehelper_exec_sync()). > > Ah, well that would be a different fix required, becuase again, > br_stp_start() does not untangle the correct error today really. > I also I think it would be odd odd that SIGSEGV or another signal > is what was terminating Christian's bridge stp call, but let's > find out! > > Note we pass 0 to the options to wait so the mistake here could indeed > be that we did not need KWIFSIGNALED(). I was afraid of this prospect... > as it other implications. > > It means we either *open code* all callers, or we handle this in a > unified way on the umh. And if we do handle this in a unified way, it > then begs the question as to *what* do we pass for the signals case and > continued case. Below we just pass the signal, and treat continued as > OK, but treating continued as OK would also be a *new* change as well. > > For instance (this goes just boot tested, but Christian if you can > try this as well that would be appreciated): Does not help, the bridge stays in DOWN state.
On Wed, Jul 01, 2020 at 05:48:48PM +0200, Christian Borntraeger wrote: > > > On 01.07.20 17:38, Luis Chamberlain wrote: > > On Wed, Jul 01, 2020 at 11:08:57PM +0900, Tetsuo Handa wrote: > >> On 2020/07/01 22:53, Luis Chamberlain wrote: > >>>> Well, it is not br_stp_call_user() but br_stp_start() which is expecting > >>>> to set sub_info->retval for both KWIFEXITED() case and KWIFSIGNALED() case. > >>>> That is, sub_info->retval needs to carry raw value (i.e. without "umh: fix > >>>> processed error when UMH_WAIT_PROC is used" will be the correct behavior). > >>> > >>> br_stp_start() doesn't check for the raw value, it just checks for err > >>> or !err. So the patch, "umh: fix processed error when UMH_WAIT_PROC is > >>> used" propagates the correct error now. > >> > >> No. If "/sbin/bridge-stp virbr0 start" terminated due to e.g. SIGSEGV > >> (for example, by inserting "kill -SEGV $$" into right after "#!/bin/sh" line), > >> br_stp_start() needs to select BR_KERNEL_STP path. We can't assume that > >> /sbin/bridge-stp is always terminated by exit() syscall (and hence we can't > >> ignore KWIFSIGNALED() case in call_usermodehelper_exec_sync()). > > > > Ah, well that would be a different fix required, becuase again, > > br_stp_start() does not untangle the correct error today really. > > I also I think it would be odd odd that SIGSEGV or another signal > > is what was terminating Christian's bridge stp call, but let's > > find out! > > > > Note we pass 0 to the options to wait so the mistake here could indeed > > be that we did not need KWIFSIGNALED(). I was afraid of this prospect... > > as it other implications. > > > > It means we either *open code* all callers, or we handle this in a > > unified way on the umh. And if we do handle this in a unified way, it > > then begs the question as to *what* do we pass for the signals case and > > continued case. Below we just pass the signal, and treat continued as > > OK, but treating continued as OK would also be a *new* change as well. > > > > For instance (this goes just boot tested, but Christian if you can > > try this as well that would be appreciated): > > > Does not help, the bridge stays in DOWN state. OK thanks for testing, that was fast! Does your code go through the STP kernel path or userpath? If it is taking the STP kernel path then this is not the real culprit to your issue then. Luis
On 01.07.20 17:58, Luis Chamberlain wrote: [...] >>> >>> Ah, well that would be a different fix required, becuase again, >>> br_stp_start() does not untangle the correct error today really. >>> I also I think it would be odd odd that SIGSEGV or another signal >>> is what was terminating Christian's bridge stp call, but let's >>> find out! >>> >>> Note we pass 0 to the options to wait so the mistake here could indeed >>> be that we did not need KWIFSIGNALED(). I was afraid of this prospect... >>> as it other implications. >>> >>> It means we either *open code* all callers, or we handle this in a >>> unified way on the umh. And if we do handle this in a unified way, it >>> then begs the question as to *what* do we pass for the signals case and >>> continued case. Below we just pass the signal, and treat continued as >>> OK, but treating continued as OK would also be a *new* change as well. >>> >>> For instance (this goes just boot tested, but Christian if you can >>> try this as well that would be appreciated): >> >> >> Does not help, the bridge stays in DOWN state. > > OK thanks for testing, that was fast! Does your code go through the > STP kernel path or userpath? If it is taking the STP kernel path > then this is not the real culprit to your issue then. I have no idea and I cannot look into this right now. I can test patches as compile,reboot and test is almost no effort. FWIW, this is just the network of a KVM guest of libvirts default network no longer working, maybe you can reproduce this on x86 as well?
On 2020/07/02 0:38, Luis Chamberlain wrote: > @@ -156,6 +156,18 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) > */ > if (KWIFEXITED(ret)) > sub_info->retval = KWEXITSTATUS(ret); > + /* > + * Do we really want to be passing the signal, or do we pass > + * a single error code for all cases? > + */ > + else if (KWIFSIGNALED(ret)) > + sub_info->retval = KWTERMSIG(ret); No, this is bad. Caller of usermode helper is unable to distinguish exit(9) and e.g. SIGKILL'ed by the OOM-killer. Please pass raw exit status value. I feel that caller of usermode helper should not use exit status value. For example, call_sbin_request_key() is checking test_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags) || key_validate(key) < 0 condition (if usermode helper was invoked) in order to "ignore any errors from userspace if the key was instantiated". > + /* Same here */ > + else if (KWIFSTOPPED((ret))) > + sub_info->retval = KWSTOPSIG(ret); > + /* And are we really sure we want this? */ > + else if (KWIFCONTINUED((ret))) > + sub_info->retval = 0; > } > > /* Restore default kernel sig handler */ >
On Thu, Jul 02, 2020 at 01:26:53PM +0900, Tetsuo Handa wrote: > On 2020/07/02 0:38, Luis Chamberlain wrote: > > @@ -156,6 +156,18 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) > > */ > > if (KWIFEXITED(ret)) > > sub_info->retval = KWEXITSTATUS(ret); > > + /* > > + * Do we really want to be passing the signal, or do we pass > > + * a single error code for all cases? > > + */ > > + else if (KWIFSIGNALED(ret)) > > + sub_info->retval = KWTERMSIG(ret); > > No, this is bad. Caller of usermode helper is unable to distinguish exit(9) > and e.g. SIGKILL'ed by the OOM-killer. Right, the question is: do we care? > Please pass raw exit status value. > > I feel that caller of usermode helper should not use exit status value. > For example, call_sbin_request_key() is checking > > test_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags) || key_validate(key) < 0 > > condition (if usermode helper was invoked) in order to "ignore any errors from > userspace if the key was instantiated". For those not familiar with this code path, or if you cannot decipher the above, the code path in question was: static int call_sbin_request_key(struct key *authkey, void *aux) { ... /* do it */ ret = call_usermodehelper_keys(request_key, argv, envp, keyring, UMH_WAIT_PROC); kdebug("usermode -> 0x%x", ret); if (ret >= 0) { /* ret is the exit/wait code */ if (test_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags) || key_validate(key) < 0) ret = -ENOKEY; /* ignore any errors from userspace if the key was * instantiated */ ret = 0; } ... } And the umh patch "umh: fix processed error when UMH_WAIT_PROC is used" changed this to: - if (ret >= 0) { + if (ret != 0) { Prior to the patch negative return values from userspace were still being captured, and likewise signals, but the error value was not raw, not the actual value. After the patch, since we check for ret != 0 we still upkeep the sanity check for any error, correct the error value, but as you noted signals were ignored as I made the wrong assumption we would ignore them. The umh sub_info->retval is set after my original patch only if KWIFSIGNALED(ret)), and ignored signals, and so that would be now capitured with the additional KWIFSIGNALED(ret)) check. The question still stands: Do we want to open code all these checks or simply wrap them up in the umh. If we do the later, as you note exit(9) and a SIGKILL will be the same to the inspector in the kernel. But do we care? Do we really want umh callers differntiatin between signals and exit values? The alternative to making a compromise is using generic wrappers for things which make sense and letting the callers use those. Luis > > + /* Same here */ > > + else if (KWIFSTOPPED((ret))) > > + sub_info->retval = KWSTOPSIG(ret); > > + /* And are we really sure we want this? */ > > + else if (KWIFCONTINUED((ret))) > > + sub_info->retval = 0; > > } > > > > /* Restore default kernel sig handler */ > > >
On 2020/07/03 4:46, Luis Chamberlain wrote: > On Thu, Jul 02, 2020 at 01:26:53PM +0900, Tetsuo Handa wrote: >> On 2020/07/02 0:38, Luis Chamberlain wrote: >>> @@ -156,6 +156,18 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) >>> */ >>> if (KWIFEXITED(ret)) >>> sub_info->retval = KWEXITSTATUS(ret); >>> + /* >>> + * Do we really want to be passing the signal, or do we pass >>> + * a single error code for all cases? >>> + */ >>> + else if (KWIFSIGNALED(ret)) >>> + sub_info->retval = KWTERMSIG(ret); >> >> No, this is bad. Caller of usermode helper is unable to distinguish exit(9) >> and e.g. SIGKILL'ed by the OOM-killer. > > Right, the question is: do we care? Yes, we have to care. > And the umh patch "umh: fix processed error when UMH_WAIT_PROC is used" > changed this to: > > - if (ret >= 0) { > + if (ret != 0) { > > Prior to the patch negative return values from userspace were still > being captured, and likewise signals, but the error value was not > raw, not the actual value. After the patch, since we check for ret != 0 > we still upkeep the sanity check for any error, correct the error value, > but as you noted signals were ignored as I made the wrong assumption > we would ignore them. The umh sub_info->retval is set after my original > patch only if KWIFSIGNALED(ret)), and ignored signals, and so that > would be now capitured with the additional KWIFSIGNALED(ret)) check. "call_usermodehelper_keys() == 0" (i.e. usermode helper was successfully started and successfully terminated via exit(0)) is different from "there is nothing to do". call_sbin_request_key() == 0 case still has to check for possibility of -ENOKEY case. > > The question still stands: > > Do we want to open code all these checks or simply wrap them up in > the umh. If we do the later, as you note exit(9) and a SIGKILL will > be the same to the inspector in the kernel. But do we care? Yes, we do care. > > Do we really want umh callers differntiatin between signals and exit values? Yes, we do. > > The alternative to making a compromise is using generic wrappers for > things which make sense and letting the callers use those. I suggest just introducing KWIFEXITED()/KWEXITSTATUS()/KWIFSIGNALED()/KWTERMSIG() macros and fixing the callers, for some callers are not aware of possibility of KWIFSIGNALED() case. For example, conn_try_outdate_peer() in drivers/block/drbd/drbd_nl.c misbehaves if drbd_usermode_helper process was terminated by a signal, for the switch() statement after returning from conn_helper() is assuming that the return value of conn_helper() is a KWEXITSTATUS() value if drbd_usermode_helper process was successfully started. If drbd_usermode_helper process was terminated by SIGQUIT (which is 3), conn_try_outdate_peer() will by error hit "case P_INCONSISTENT:" (which is 3); conn_try_outdate_peer() should hit "default: /* The script is broken ... */" unless KWIFEXITED() == true. Your patch is trying to obnubilate the return code.
On Fri, Jul 03, 2020 at 09:52:01AM +0900, Tetsuo Handa wrote: > On 2020/07/03 4:46, Luis Chamberlain wrote: > > The alternative to making a compromise is using generic wrappers for > > things which make sense and letting the callers use those. > > I suggest just introducing KWIFEXITED()/KWEXITSTATUS()/KWIFSIGNALED()/KWTERMSIG() > macros and fixing the callers, for some callers are not aware of possibility of > KWIFSIGNALED() case. OK so we open code all uses. Do that in a next iteration. Luis
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c index da4a3ebe04ef..aee272e620b9 100644 --- a/drivers/block/drbd/drbd_nl.c +++ b/drivers/block/drbd/drbd_nl.c @@ -382,13 +382,11 @@ int drbd_khelper(struct drbd_device *device, char *cmd) notify_helper(NOTIFY_CALL, device, connection, cmd, 0); ret = call_usermodehelper(drbd_usermode_helper, argv, envp, UMH_WAIT_PROC); if (ret) - drbd_warn(device, "helper command: %s %s %s exit code %u (0x%x)\n", - drbd_usermode_helper, cmd, mb, - (ret >> 8) & 0xff, ret); + drbd_warn(device, "helper command: %s %s %s failed with exit code %u (0x%x)\n", + drbd_usermode_helper, cmd, mb, ret, ret); else - drbd_info(device, "helper command: %s %s %s exit code %u (0x%x)\n", - drbd_usermode_helper, cmd, mb, - (ret >> 8) & 0xff, ret); + drbd_info(device, "helper command: %s %s %s completed successfully\n", + drbd_usermode_helper, cmd, mb); sib.sib_reason = SIB_HELPER_POST; sib.helper_exit_code = ret; drbd_bcast_event(device, &sib); @@ -424,13 +422,11 @@ enum drbd_peer_state conn_khelper(struct drbd_connection *connection, char *cmd) ret = call_usermodehelper(drbd_usermode_helper, argv, envp, UMH_WAIT_PROC); if (ret) - drbd_warn(connection, "helper command: %s %s %s exit code %u (0x%x)\n", - drbd_usermode_helper, cmd, resource_name, - (ret >> 8) & 0xff, ret); + drbd_warn(connection, "helper command: %s %s %s failed with exit code %u (0x%x)\n", + drbd_usermode_helper, cmd, resource_name, ret, ret); else - drbd_info(connection, "helper command: %s %s %s exit code %u (0x%x)\n", - drbd_usermode_helper, cmd, resource_name, - (ret >> 8) & 0xff, ret); + drbd_info(connection, "helper command: %s %s %s completed successfully\n", + drbd_usermode_helper, cmd, resource_name); /* TODO: conn_bcast_event() ?? */ notify_helper(NOTIFY_RESPONSE, NULL, connection, cmd, ret); diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c index 9e40dfecf1b1..33e6a7fd7961 100644 --- a/fs/nfsd/nfs4recover.c +++ b/fs/nfsd/nfs4recover.c @@ -1820,7 +1820,7 @@ nfsd4_umh_cltrack_upcall(char *cmd, char *arg, char *env0, char *env1) ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_PROC); /* - * Disable the upcall mechanism if we're getting an ENOENT or EACCES + * Disable the upcall mechanism if we're getting an -ENOENT or -EACCES * error. The admin can re-enable it on the fly by using sysfs * once the problem has been fixed. */ diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 38359071236a..bba06befbff5 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -103,6 +103,19 @@ struct mm_struct *copy_init_mm(void); extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); extern long kernel_wait4(pid_t, int __user *, int, struct rusage *); +/* Only add helpers for actual use cases in the kernel */ +#define KWEXITSTATUS(status) (__KWEXITSTATUS(status)) +#define KWIFEXITED(status) (__KWIFEXITED(status)) + +/* Nonzero if STATUS indicates normal termination. */ +#define __KWIFEXITED(status) (__KWTERMSIG(status) == 0) + +/* If KWIFEXITED(STATUS), the low-order 8 bits of the status. */ +#define __KWEXITSTATUS(status) (((status) & 0xff00) >> 8) + +/* If KWIFSIGNALED(STATUS), the terminating signal. */ +#define __KWTERMSIG(status) ((status) & 0x7f) + extern void free_task(struct task_struct *tsk); /* sched_exec is called by processes performing an exec */ diff --git a/kernel/umh.c b/kernel/umh.c index 79f139a7ca03..f81e8698e36e 100644 --- a/kernel/umh.c +++ b/kernel/umh.c @@ -154,8 +154,8 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) * the real error code is already in sub_info->retval or * sub_info->retval is 0 anyway, so don't mess with it then. */ - if (ret) - sub_info->retval = ret; + if (KWIFEXITED(ret)) + sub_info->retval = KWEXITSTATUS(ret); } /* Restore default kernel sig handler */ diff --git a/net/bridge/br_stp_if.c b/net/bridge/br_stp_if.c index ba55851fe132..bdd94b45396b 100644 --- a/net/bridge/br_stp_if.c +++ b/net/bridge/br_stp_if.c @@ -133,14 +133,8 @@ static int br_stp_call_user(struct net_bridge *br, char *arg) /* call userspace STP and report program errors */ rc = call_usermodehelper(BR_STP_PROG, argv, envp, UMH_WAIT_PROC); - if (rc > 0) { - if (rc & 0xff) - br_debug(br, BR_STP_PROG " received signal %d\n", - rc & 0x7f); - else - br_debug(br, BR_STP_PROG " exited with code %d\n", - (rc >> 8) & 0xff); - } + if (rc != 0) + br_debug(br, BR_STP_PROG " failed with exit code %d\n", rc); return rc; } diff --git a/security/keys/request_key.c b/security/keys/request_key.c index e1b9f1a80676..ff462f3d46ca 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c @@ -193,7 +193,7 @@ static int call_sbin_request_key(struct key *authkey, void *aux) ret = call_usermodehelper_keys(request_key, argv, envp, keyring, UMH_WAIT_PROC); kdebug("usermode -> 0x%x", ret); - if (ret >= 0) { + if (ret != 0) { /* ret is the exit/wait code */ if (test_bit(KEY_FLAG_USER_CONSTRUCT, &key->flags) || key_validate(key) < 0)