Message ID | 1486058783.2816.6.camel@sandisk.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Thu, 2 Feb 2017 18:06:51 +0000, Bart Van Assche wrote: > Thanks for the report. Can you repeat your test against the > scsi-target-for-v4.11 branch of > https://git.kernel.org/cgit/linux/kernel/git/bvanassche/linux.git/ (commit > e06fd3154941)? With e06fd3154941 I still see the iscsit_cause_connection_reinstatement + core_tmr_lun_reset deadlock. On the initiator side, it appears that it's the iSCSITMF.LUNResetSimpleAsync test which triggers the bug - MultipathIO.Reset is just the red flag in this case because it blocks waiting for the TMF response. Cheers, David -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2017-02-02 at 19:58 +0100, David Disseldorp wrote: > On Thu, 2 Feb 2017 18:06:51 +0000, Bart Van Assche wrote: > > > Thanks for the report. Can you repeat your test against the > > scsi-target-for-v4.11 branch of > > https://git.kernel.org/cgit/linux/kernel/git/bvanassche/linux.git/ (commit > > e06fd3154941)? > > With e06fd3154941 I still see the iscsit_cause_connection_reinstatement > + core_tmr_lun_reset deadlock. On the initiator side, it appears that > it's the iSCSITMF.LUNResetSimpleAsync test which triggers the bug - > MultipathIO.Reset is just the red flag in this case because it blocks > waiting for the TMF response. What version of libiscsi are you using? If I try to run that test it gets skipped: $ ./iscsi-test-cu --dataloss --allow-sanitize -t iSCSITMF.LUNResetSimpleAsync iscsi://127.0.0.1/tgt1/0 [SKIPPED] REPORT_SUPPORTED_OPCODES is not implemented. Bart.-- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[resend to list] On Thu, 2 Feb 2017 19:04:37 +0000, Bart Van Assche wrote: > What version of libiscsi are you using? If I try to run that test it gets > skipped: > > $ ./iscsi-test-cu --dataloss --allow-sanitize -t iSCSITMF.LUNResetSimpleAsync iscsi://127.0.0.1/tgt1/0 > [SKIPPED] REPORT_SUPPORTED_OPCODES is not implemented. Current master - the REPORT_SUPPORTED_OPCODES skip can be ignored... the full test name needs a prefix: --test=ALL.iSCSITMF.LUNResetSimpleAsync or iSCSI.iSCSITMF.LUNResetSimpleAsync Cheers, David -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2017-02-02 at 20:10 +0100, David Disseldorp wrote: > Current master - the REPORT_SUPPORTED_OPCODES skip can be ignored... > the full test name needs a prefix: > --test=ALL.iSCSITMF.LUNResetSimpleAsync > or > iSCSI.iSCSITMF.LUNResetSimpleAsync Even if I run that test a few hundred times in a loop "echo w > /proc/sysrq-trigger" still doesn't report a lockup. I would appreciate it if you could provide me sufficient information to reproduce the behavior you reported. Bart.-- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2 Feb 2017 19:16:13 +0000, Bart Van Assche wrote: > On Thu, 2017-02-02 at 20:10 +0100, David Disseldorp wrote: > > Current master - the REPORT_SUPPORTED_OPCODES skip can be ignored... > > the full test name needs a prefix: > > --test=ALL.iSCSITMF.LUNResetSimpleAsync > > or > > iSCSI.iSCSITMF.LUNResetSimpleAsync > > Even if I run that test a few hundred times in a loop "echo w > > /proc/sysrq-trigger" still doesn't report a lockup. I would appreciate it if > you could provide me sufficient information to reproduce the behavior you > reported. But do you see the stuck core_tmr_lun_reset and iscsit_cause_connection_reinstatement threads? I could have worded my report better. The "lockup" I was referring to was the initiator awaiting TMF response and never getting one - which doesn't happen with mainline. This is easiest to reproduce with: iscsi-test-cu --test=ALL.iSCSITMF.LUNResetSimpleAsync ... and then iscsi-test-cu --test=ALL.MultipathIO.Reset ... (you'll need to provide your IQN twice for MPIO) If you run MultipathIO.Reset before iSCSITMF.LUNResetSimpleAsync then it runs fine (ignoring the test failure), if you run it afterwards then it blocks indefinitely, as LIO never sends a TMF response. Cheers, David -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2017-02-02 at 20:34 +0100, David Disseldorp wrote: > But do you see the stuck core_tmr_lun_reset and > iscsit_cause_connection_reinstatement threads? Sorry but I don't see that. > I could have worded my report better. The "lockup" I was referring to > was the initiator awaiting TMF response and never getting one - which > doesn't happen with mainline. > > This is easiest to reproduce with: > iscsi-test-cu --test=ALL.iSCSITMF.LUNResetSimpleAsync ... > and then > iscsi-test-cu --test=ALL.MultipathIO.Reset ... (you'll need to provide > your IQN twice for MPIO) > > If you run MultipathIO.Reset before iSCSITMF.LUNResetSimpleAsync then > it runs fine (ignoring the test failure), if you run it afterwards then > it blocks indefinitely, as LIO never sends a TMF response. Sorry but even with that test sequence I don't see the initiator waiting longer than expected. As one can see below both tests complete in a few milliseconds: $ ./iscsi-test-cu --dataloss -t iSCSI.iSCSITMF.LUNResetSimpleAsync iscsi://127.0.0.1/tgt1/0 && ./iscsi-test-cu --dataloss -t ALL.MultipathIO.Reset iscsi://127.0.0.1/tgt1/0 iscsi://127.0.0.1/tgt1/0 [SKIPPED] REPORT_SUPPORTED_OPCODES is not implemented. CUnit - A unit testing framework for C - Version 2.1-3 http://cunit.sourceforge.net/ Suite: iSCSITMF Test: LUNResetSimpleAsync ...passed Run Summary: Type Total Ran Passed Failed Inactive suites 1 1 n/a 0 0 tests 1 1 1 0 0 asserts 16 16 16 0 n/a Elapsed time = 0.001 seconds Tests completed with return value: 0 skipping non-LU designator: 2 skipping non-LU designator: 1 skipping unsupported des type: 6 skipping non-LU designator: 1 skipping non-LU designator: 1 skipping unsupported des type: 1 skipping non-LU designator: 2 skipping non-LU designator: 1 skipping unsupported des type: 6 skipping non-LU designator: 1 skipping non-LU designator: 1 skipping unsupported des type: 1 found matching LU device identifier for all (2) paths [SKIPPED] REPORT_SUPPORTED_OPCODES is not implemented. CUnit - A unit testing framework for C - Version 2.1-3 http://cunit.sourceforge.net/ Suite: MultipathIO Test: Reset ...FAILED 1. test_multipathio_reset.c:70 - CU_ASSERT_NOT_EQUAL(num_uas,0) 2. test_multipathio_reset.c:70 - CU_ASSERT_NOT_EQUAL(num_uas,0) Run Summary: Type Total Ran Passed Failed Inactive suites 1 1 n/a 0 0 tests 1 1 0 1 0 asserts 12 12 10 2 n/a Elapsed time = 0.002 seconds Tests completed with return value: 0-- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2017-02-02 at 20:34 +0100, David Disseldorp wrote: > The "lockup" I was referring to was the initiator awaiting TMF response > and never getting one - which doesn't happen with mainline. Hello David, It would be easy to modify the code such that a response is sent for all LU RESET TMFs. However, in SAM-5 section 6.3.3 LU RESET I found the following: When responding to a logical unit reset condition, the logical unit shall: [ ... ] c) terminate all task management functions. [ ... ] Your test submits two concurrent LU RESETs to the same logical unit. So I think sending a response for only one of these two LU RESETs is compliant with the SCSI specs. Bart.-- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Feb 03, 2017 at 05:20:06PM +0000, Bart Van Assche wrote: > On Thu, 2017-02-02 at 20:34 +0100, David Disseldorp wrote: > > The "lockup" I was referring to was the initiator awaiting TMF response > > and never getting one - which doesn't happen with mainline. > > Hello David, > > It would be easy to modify the code such that a response is sent for all LU > RESET TMFs. Please do so - sudden changes in behavior that we don't have a good reason for are always a bad thing. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Bart, On Fri, 3 Feb 2017 17:20:06 +0000, Bart Van Assche wrote: > It would be easy to modify the code such that a response is sent for all LU > RESET TMFs. However, in SAM-5 section 6.3.3 LU RESET I found the following: > > When responding to a logical unit reset condition, the logical unit shall: > [ ... ] > c) terminate all task management functions. > [ ... ] > > Your test submits two concurrent LU RESETs to the same logical unit. So I > think sending a response for only one of these two LU RESETs is compliant > with the SCSI specs. I don't there are any cases where the LU RESETs are concurrent: - test_multipathio_reset.c issues an LU RESET on each of the two paths, but the TMF requests are sent synchronously. - test_async_lu_reset_simple.c dispatches a bunch of WRITE10s and then a single async LU RESET while the write I/Os are outstanding. I'll do some network sniffing today, and upload the mainline and e06fd3154941 captures for comparison. Cheers, David -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 6 Feb 2017 11:37:06 +0100, David Disseldorp wrote: > Hi Bart, > > On Fri, 3 Feb 2017 17:20:06 +0000, Bart Van Assche wrote: > > > It would be easy to modify the code such that a response is sent for all LU > > RESET TMFs. However, in SAM-5 section 6.3.3 LU RESET I found the following: > > > > When responding to a logical unit reset condition, the logical unit shall: > > [ ... ] > > c) terminate all task management functions. > > [ ... ] > > > > Your test submits two concurrent LU RESETs to the same logical unit. So I > > think sending a response for only one of these two LU RESETs is compliant > > with the SCSI specs. > > I don't there are any cases where the LU RESETs are concurrent: > - test_multipathio_reset.c issues an LU RESET on each of the two paths, > but the TMF requests are sent synchronously. > - test_async_lu_reset_simple.c dispatches a bunch of WRITE10s and then a > single async LU RESET while the write I/Os are outstanding. s/bunch of WRITE10s/single WRITE10/ > I'll do some network sniffing today, and upload the mainline and > e06fd3154941 captures for comparison. Please find the traces at: https://www.samba.org/~ddiss/netcap/lio_tmf_sync/ The 4.10.0-rc7 trace shows an immediate TMF response. In the e06fd3154941 trace, the initiator waits six seconds for the LU RESET response before logging out (the TMF response never comes). Cheers, David -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2017-02-06 at 16:58 +0100, David Disseldorp wrote: > On Mon, 6 Feb 2017 11:37:06 +0100, David Disseldorp wrote: > > I'll do some network sniffing today, and upload the mainline and > > e06fd3154941 captures for comparison. > > Please find the traces at: > https://www.samba.org/~ddiss/netcap/lio_tmf_sync/ > > The 4.10.0-rc7 trace shows an immediate TMF response. In the > e06fd3154941 trace, the initiator waits six seconds for the LU RESET > response before logging out (the TMF response never comes). Can you also upload the output of (cd /sys/kernel/config/target && find -type f | xargs grep -aH '') ? We might have been using different settings. Thanks, Bart. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 6 Feb 2017 16:01:42 +0000, Bart Van Assche wrote: > > Please find the traces at: > > https://www.samba.org/~ddiss/netcap/lio_tmf_sync/ > > > > The 4.10.0-rc7 trace shows an immediate TMF response. In the > > e06fd3154941 trace, the initiator waits six seconds for the LU RESET > > response before logging out (the TMF response never comes). > > Can you also upload the output of (cd /sys/kernel/config/target && find > -type f | xargs grep -aH '') ? We might have been using different settings. https://www.samba.org/~ddiss/netcap/lio_tmf_sync/lio_configfs.txt FWIW, this was configured using the script at: https://github.com/ddiss/rapido/blob/master/lio_local_autorun.sh Cheers, David -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2017-02-06 at 17:44 +0100, David Disseldorp wrote: > FWIW, this was configured using the script at: > https://github.com/ddiss/rapido/blob/master/lio_local_autorun.sh Hello David, Thanks for having provided that script, that's very helpful. I ran that script after I had entered the following: _fatal() { exit 1 } DYN_DEBUG_MODULES= DYN_DEBUG_FILES= INITIATOR_IQNS=" iqn.2007-10.com.github:sahlberg:libiscsi:iscsi-test iqn.2007-10.com.github:sahlberg:libiscsi:iscsi-test-2 " TARGET_IQN=tgt1 IP_ADDR1=$(ip addr show dev eth0 | sed -n 's,^[[:blank:]]*inet \([^/]*\)/.*$,\1,p') MAC_ADDR1= IP_ADDR2= MAC_ADDR2=foobar Next, I ran the two libiscsi tests mentioned earlier: for ((i=0;i<100;i++)); do for t in ALL.iSCSITMF.LUNResetSimpleAsync ALL.MultipathIO.Reset; do iscsi-test-cu --dataloss --allow-sanitize -t $t iscsi://$IP_ADDR1/tgt1/0 iscsi://$IP_ADDR1/tgt1/0 done done That loop completed in about five seconds. Sorry but that means that I am still unable to reproduce the missing TMF reply that you have reported. Bart.-- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Feb 06, 2017 at 07:35:47PM +0000, Bart Van Assche wrote: > That loop completed in about five seconds. Sorry but that means that I am still > unable to reproduce the missing TMF reply that you have reported. It might be worth to compare the kernel preemption settings and how many cpus each of you has. And check the exact libiscsi version to be sure. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2017-02-06 at 01:01 -0800, Christoph Hellwig wrote: > Sudden changes in behavior that we don't have a good reason for are always > a bad thing. Agreed. But if the missing LU RESET response would be a change that has been introduced by this patch series then I expect that only a small change will be needed to bring that response back. Whether or not a response is sent for a command or TMF that has been aborted is controlled by a single variable (send_abort_response). If we would not be able to figure out the root cause before the v4.11 merge window opens this is something we can still address after this patch series went upstream. Bart.-- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 6 Feb 2017 19:35:47 +0000, Bart Van Assche wrote: > On Mon, 2017-02-06 at 17:44 +0100, David Disseldorp wrote: > > FWIW, this was configured using the script at: > > https://github.com/ddiss/rapido/blob/master/lio_local_autorun.sh > > Hello David, > > Thanks for having provided that script, that's very helpful. I ran that script > after I had entered the following: > > _fatal() { > exit 1 > } > > DYN_DEBUG_MODULES= > DYN_DEBUG_FILES= > INITIATOR_IQNS=" > iqn.2007-10.com.github:sahlberg:libiscsi:iscsi-test > iqn.2007-10.com.github:sahlberg:libiscsi:iscsi-test-2 > " > TARGET_IQN=tgt1 > IP_ADDR1=$(ip addr show dev eth0 | sed -n 's,^[[:blank:]]*inet \([^/]*\)/.*$,\1,p') > MAC_ADDR1= > IP_ADDR2= > MAC_ADDR2=foobar > > Next, I ran the two libiscsi tests mentioned earlier: > > for ((i=0;i<100;i++)); do > for t in ALL.iSCSITMF.LUNResetSimpleAsync ALL.MultipathIO.Reset; do > iscsi-test-cu --dataloss --allow-sanitize -t $t iscsi://$IP_ADDR1/tgt1/0 iscsi://$IP_ADDR1/tgt1/0 > done > done > > That loop completed in about five seconds. Sorry but that means that I am still > unable to reproduce the missing TMF reply that you have reported. Aha - If I run the test against a fileio backed LU then it passes, it fails against either of the iblock backed LUs. Perhaps this race is dependent on the I/O making it to the backstore/block layer by the time the LU RESET request comes in? In the past I hit a bug similar to this (in the ABORT TASK path), and used the dm-delay device (setup by the script) to trip the race. Do you see the failure when testing against LUN1 or LUN2? Cheers, David -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-02-07 at 11:59 +0100, David Disseldorp wrote: > On Mon, 6 Feb 2017 19:35:47 +0000, Bart Van Assche wrote: > > > On Mon, 2017-02-06 at 17:44 +0100, David Disseldorp wrote: > > > FWIW, this was configured using the script at: > > > https://github.com/ddiss/rapido/blob/master/lio_local_autorun.sh > > > > Hello David, > > > > Thanks for having provided that script, that's very helpful. I ran that script > > after I had entered the following: > > > > _fatal() { > > exit 1 > > } > > > > DYN_DEBUG_MODULES= > > DYN_DEBUG_FILES= > > INITIATOR_IQNS=" > > iqn.2007-10.com.github:sahlberg:libiscsi:iscsi-test > > iqn.2007-10.com.github:sahlberg:libiscsi:iscsi-test-2 > > " > > TARGET_IQN=tgt1 > > IP_ADDR1=$(ip addr show dev eth0 | sed -n 's,^[[:blank:]]*inet \([^/]*\)/.*$,\1,p') > > MAC_ADDR1= > > IP_ADDR2= > > MAC_ADDR2=foobar > > > > Next, I ran the two libiscsi tests mentioned earlier: > > > > for ((i=0;i<100;i++)); do > > for t in ALL.iSCSITMF.LUNResetSimpleAsync ALL.MultipathIO.Reset; do > > iscsi-test-cu --dataloss --allow-sanitize -t $t iscsi://$IP_ADDR1/tgt1/0 iscsi://$IP_ADDR1/tgt1/0 > > done > > done > > > > That loop completed in about five seconds. Sorry but that means that I am still > > unable to reproduce the missing TMF reply that you have reported. > > Aha - If I run the test against a fileio backed LU then it passes, it > fails against either of the iblock backed LUs. That is because all FILEIO backend I/O is synchronous, so no se_cmd descriptors are ever hitting CMD_T_ABORTED for ABORT_TASK or LUN_RESET in your test. ;) > Perhaps this race is > dependent on the I/O making it to the backstore/block layer by the time > the LU RESET request comes in? In the past I hit a bug similar to this > (in the ABORT TASK path), and used the dm-delay device (setup by the > script) to trip the race. > > Do you see the failure when testing against LUN1 or LUN2? The fatal flaw with patch #19 is the new se_cmd->finished completion introduced to handle all CMD_T_ABORTED cases can never make forward progress in any case, because CMD_T_ABORTED logic takes it's own se_cmd->cmd_kref in __target_check_io_state(), and then blocks on wait_for_completion_timeout(&se_cmd->finished). In order to complete se_cmd->finished, se_cmd->cmd_kref must reach zero to call target_release_cmd_kref() -> complete_all(&se_cmd->finished), but since the tmr kthread caller who is blocked on se_cmd->finished holds the final se_cmd->cmd_kref reference, it's fatal for the simple first order scenario every time. Patch #19 + #20 breaks the second order issue where CMD_T_ABORTED happens concurrently with se_session shutdown CMD_T_FABRIC_STOP too. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-02-07 at 08:12 -0800, Nicholas A. Bellinger wrote: > On Tue, 2017-02-07 at 11:59 +0100, David Disseldorp wrote: > > On Mon, 6 Feb 2017 19:35:47 +0000, Bart Van Assche wrote: > > > > > On Mon, 2017-02-06 at 17:44 +0100, David Disseldorp wrote: > > > > FWIW, this was configured using the script at: > > > > https://github.com/ddiss/rapido/blob/master/lio_local_autorun.sh > > > > > > Hello David, > > > > > > Thanks for having provided that script, that's very helpful. I ran that script > > > after I had entered the following: > > > > > > _fatal() { > > > exit 1 > > > } > > > > > > DYN_DEBUG_MODULES= > > > DYN_DEBUG_FILES= > > > INITIATOR_IQNS=" > > > iqn.2007-10.com.github:sahlberg:libiscsi:iscsi-test > > > iqn.2007-10.com.github:sahlberg:libiscsi:iscsi-test-2 > > > " > > > TARGET_IQN=tgt1 > > > IP_ADDR1=$(ip addr show dev eth0 | sed -n 's,^[[:blank:]]*inet \([^/]*\)/.*$,\1,p') > > > MAC_ADDR1= > > > IP_ADDR2= > > > MAC_ADDR2=foobar > > > > > > Next, I ran the two libiscsi tests mentioned earlier: > > > > > > for ((i=0;i<100;i++)); do > > > for t in ALL.iSCSITMF.LUNResetSimpleAsync ALL.MultipathIO.Reset; do > > > iscsi-test-cu --dataloss --allow-sanitize -t $t iscsi://$IP_ADDR1/tgt1/0 iscsi://$IP_ADDR1/tgt1/0 > > > done > > > done > > > > > > That loop completed in about five seconds. Sorry but that means that I am still > > > unable to reproduce the missing TMF reply that you have reported. > > > > Aha - If I run the test against a fileio backed LU then it passes, it > > fails against either of the iblock backed LUs. > > That is because all FILEIO backend I/O is synchronous, so no se_cmd > descriptors are ever hitting CMD_T_ABORTED for ABORT_TASK or LUN_RESET > in your test. ;) > Btw, a real simple way to trigger these bugs is to use a IBLOCK backend that doesn't complete BIOs back to target-core for an extended amount of time (say 180s seconds). The delay will result in open-iscsi issuing a ABORT_TASK that blocks waiting for backend I/O completion (first order issue), followed by session reinstatement (second order issue). As-is this series will create a bunch of hung tasks stuck in un-interruptible sleep (eg: 'D' state) forever. Since you've already verified with v4.10 is working as expected with your initial TMR LUN_RESET test case , it would be very useful to verify the first and second order issues work as expected in mainline with this type of pathological case, vs. the changes proposed here. -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2017-02-07 at 11:59 +0100, David Disseldorp wrote: > Aha - If I run the test against a fileio backed LU then it passes, it > fails against either of the iblock backed LUs. Perhaps this race is > dependent on the I/O making it to the backstore/block layer by the time > the LU RESET request comes in? In the past I hit a bug similar to this > (in the ABORT TASK path), and used the dm-delay device (setup by the > script) to trip the race. > > Do you see the failure when testing against LUN1 or LUN2? That's a good catch. I have integrated a fix in my scsi-target-for-v4.11 branch on kernel.org. Would it be possible to repeat your test against that code base (commit acb1c8eca422)? Thanks, Bart.-- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, 8 Feb 2017 01:46:04 +0000, Bart Van Assche wrote: > > Do you see the failure when testing against LUN1 or LUN2? > > That's a good catch. I have integrated a fix in my scsi-target-for-v4.11 > branch on kernel.org. Would it be possible to repeat your test against that > code base (commit acb1c8eca422)? I can confirm that acb1c8eca422 fixes the stuck core_tmr_lun_reset and iscsit_cause_connection_reinstatement threads - thanks. Cheers, David -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index 9d1f4734c950..1b7988407ef6 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -519,7 +519,8 @@ void transport_free_session(struct se_session *se_sess) se_sess->se_node_acl = NULL; target_put_nacl(se_nacl); } - destroy_workqueue(se_sess->tmf_wq); + if (se_sess->tmf_wq) + destroy_workqueue(se_sess->tmf_wq); if (se_sess->sess_cmd_map) { percpu_ida_destroy(&se_sess->sess_tag_pool); kvfree(se_sess->sess_cmd_map);