Message ID | 20240307093827.2307279-1-wintera@linux.ibm.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] s390/qeth: handle deferred cc1 | expand |
On 07.03.2024 10:38, Alexandra Winter wrote: > The IO subsystem expects a driver to retry a ccw_device_start, when the > subsequent interrupt response block (irb) contains a deferred > condition code 1. > > Symptoms before this commit: > On the read channel we always trigger the next read anyhow, so no > different behaviour here. > On the write channel we may experience timeout errors, because the > expected reply will never be received without the retry. > Other callers of qeth_send_control_data() may wrongly assume that the ccw > was successful, which may cause problems later. > > Note that since > commit 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") > and > commit 5ef1dc40ffa6 ("s390/cio: fix invalid -EBUSY on ccw_device_start") > deferred CC1s are more likely to occur. See the commit message of the > latter for more background information. > > Fixes: 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") > Reference-ID: LTC205042 > Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com>
On Thu, Mar 07, 2024 at 10:38:27AM +0100, Alexandra Winter wrote: > The IO subsystem expects a driver to retry a ccw_device_start, when the > subsequent interrupt response block (irb) contains a deferred > condition code 1. > > Symptoms before this commit: > On the read channel we always trigger the next read anyhow, so no > different behaviour here. > On the write channel we may experience timeout errors, because the > expected reply will never be received without the retry. > Other callers of qeth_send_control_data() may wrongly assume that the ccw > was successful, which may cause problems later. > > Note that since > commit 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") > and > commit 5ef1dc40ffa6 ("s390/cio: fix invalid -EBUSY on ccw_device_start") > deferred CC1s are more likely to occur. See the commit message of the > latter for more background information. > > Fixes: 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") > Reference-ID: LTC205042 > Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> > --- > v1->v2: correct patch format > > drivers/s390/net/qeth_core_main.c | 36 +++++++++++++++++++++++++++++-- > 1 file changed, 34 insertions(+), 2 deletions(-) This patch seems to introduce a reference count problem, which results in use-after-free reports like the one below. So, please ignore the patch - this needs to be sorted out first. [ 553.017253] ------------[ cut here ]------------ [ 553.017255] refcount_t: addition on 0; use-after-free. [ 553.017269] WARNING: CPU: 12 PID: 115746 at lib/refcount.c:25 refcount_warn_saturate+0x10e/0x130 [ 553.017275] Modules linked in: kvm algif_hash af_alg nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink dm_service_time mlx5_ib ib_uverbs ib_core uvdevice +s390_trng eadm_sch vfio_ccw mdev vfio_iommu_type1 vfio sch_fq_codel loop configfs lcs ctcm fsm zfcp scsi_transport_fc mlx5_core ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkey +zcrypt rng_core dm_multipath autofs4 [ 553.017331] Unloaded tainted modules: test_klp_state3(K):1 test_klp_state2(K):4 test_klp_state(K):3 test_klp_callbacks_demo2(K):2 test_klp_callbacks_demo(K):12 test_klp_atomic_replace(K):2 test_klp_livepatch(K):6 [last unloaded: kvm] [ 553.017405] CPU: 12 PID: 115746 Comm: qeth_recover Tainted: G W K 6.8.0-20240305.rc7.git1.570c73d6a3df.300.fc39.s390x #1 [ 553.017408] Hardware name: IBM 8561 T01 701 (LPAR) [ 553.017409] Krnl PSW : 0404c00180000000 00000001a53cad02 (refcount_warn_saturate+0x112/0x130) [ 553.017413] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 [ 553.017415] Krnl GPRS: c000000000000027 0000000000000023 000000000000002a 0000000100000000 [ 553.017417] 00000004b38e0900 0000000000000000 0000000095dfc2c0 0000000000001000 [ 553.017419] 00000001a563eb58 000000009938d818 0000000000001007 000000009938d800 [ 553.017421] 0000000095dfac00 0000000000000000 00000001a53cacfe 000003800f24bc10 [ 553.017426] Krnl Code: 00000001a53cacf2: c020004f67d8 larl %r2,00000001a5db7ca2 00000001a53cacf8: c0e5ffc262b4 brasl %r14,00000001a4c17260 #00000001a53cacfe: af000000 mc 0,0 >00000001a53cad02: a7f4ff9a brc 15,00000001a53cac36 00000001a53cad06: 92014008 mvi 8(%r4),1 00000001a53cad0a: c020004f67f6 larl %r2,00000001a5db7cf6 00000001a53cad10: c0e5ffc262a8 brasl %r14,00000001a4c17260 00000001a53cad16: af000000 mc 0,0 [ 553.017439] Call Trace: [ 553.017440] [<00000001a53cad02>] refcount_warn_saturate+0x112/0x130 [ 553.017443] ([<00000001a53cacfe>] refcount_warn_saturate+0x10e/0x130) [ 553.017445] [<00000001a563db88>] __qeth_issue_next_read+0x270/0x288 [ 553.017448] [<00000001a563dc38>] qeth_mpc_initialize.constprop.0+0x98/0x738 [ 553.017449] [<00000001a564007e>] qeth_hardsetup_card+0x36e/0xb38 [ 553.017451] [<00000001a56408d8>] qeth_set_online+0x90/0x3a0 [ 553.017453] [<00000001a5640d04>] qeth_do_reset+0x11c/0x1f8 [ 553.017455] [<00000001a4c47f30>] kthread+0x120/0x128 [ 553.017458] [<00000001a4bc3014>] __ret_from_fork+0x3c/0x58 [ 553.017460] [<00000001a58d717a>] ret_from_fork+0xa/0x30 [ 553.017463] Last Breaking-Event-Address: [ 553.017464] [<00000001a4c172de>] __warn_printk+0x7e/0xf0 [ 553.017467] ---[ end trace 0000000000000000 ]--- [ 553.017474] qeth 0.0.bd00: The qeth device driver failed to recover an error on the device [ 553.018615] qeth 0.0.bd00: The qeth device driver failed to recover an error on the device
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index cf8506d0f185..bc6f3f1f25ba 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -1179,6 +1179,19 @@ static int qeth_check_irb_error(struct qeth_card *card, struct ccw_device *cdev, } } +/** + * qeth_irq() - qeth interrupt handler + * @cdev: ccw device + * @intparm: expect pointer to iob + * @irb: Interruption Response Block + * + * In the good path: + * corresponding qeth channel is locked with last used iob as active_cmd. + * But this function is also called for error interrupts. + * + * Caller ensures that: + * Interrupts are disabled; ccw device lock is held; + */ static void qeth_irq(struct ccw_device *cdev, unsigned long intparm, struct irb *irb) { @@ -1220,11 +1233,10 @@ static void qeth_irq(struct ccw_device *cdev, unsigned long intparm, iob = (struct qeth_cmd_buffer *) (addr_t)intparm; } - qeth_unlock_channel(card, channel); - rc = qeth_check_irb_error(card, cdev, irb); if (rc) { /* IO was terminated, free its resources. */ + qeth_unlock_channel(card, channel); if (iob) qeth_cancel_cmd(iob, rc); return; @@ -1276,6 +1288,26 @@ static void qeth_irq(struct ccw_device *cdev, unsigned long intparm, } } + if (scsw_cmd_is_valid_cc(&irb->scsw) && irb->scsw.cmd.cc == 1 && iob) { + /* channel command hasn't started: retry. + * active_cmd is still set to last iob + */ + QETH_CARD_TEXT(card, 2, "irqcc1"); + rc = ccw_device_start_timeout(cdev, __ccw_from_cmd(iob), + (addr_t)iob, 0, 0, iob->timeout); + if (rc) { + QETH_DBF_MESSAGE(2, + "ccw retry on %x failed, rc = %i\n", + CARD_DEVID(card), rc); + QETH_CARD_TEXT_(card, 2, " err%d", rc); + qeth_unlock_channel(card, channel); + qeth_cancel_cmd(iob, rc); + } + return; + } + + qeth_unlock_channel(card, channel); + if (iob) { /* sanity check: */ if (irb->scsw.cmd.count > iob->length) {
The IO subsystem expects a driver to retry a ccw_device_start, when the subsequent interrupt response block (irb) contains a deferred condition code 1. Symptoms before this commit: On the read channel we always trigger the next read anyhow, so no different behaviour here. On the write channel we may experience timeout errors, because the expected reply will never be received without the retry. Other callers of qeth_send_control_data() may wrongly assume that the ccw was successful, which may cause problems later. Note that since commit 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") and commit 5ef1dc40ffa6 ("s390/cio: fix invalid -EBUSY on ccw_device_start") deferred CC1s are more likely to occur. See the commit message of the latter for more background information. Fixes: 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") Reference-ID: LTC205042 Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> --- v1->v2: correct patch format drivers/s390/net/qeth_core_main.c | 36 +++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-)