diff mbox

Revert "block: Add warning for bi_next not NULL in bio_endio()"

Message ID 20180522235505.20937-1-bart.vanassche@wdc.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bart Van Assche May 22, 2018, 11:55 p.m. UTC
This patch avoids that KASAN reports the following complaint when
running the srp-test software:

BUG: KASAN: use-after-free in bio_advance+0x110/0x1b0
Read of size 4 at addr ffff88014c7aa950 by task ksoftirqd/10/72
Call Trace:
dump_stack+0x9a/0xeb
print_address_description+0x65/0x270
kasan_report+0x232/0x350
bio_advance+0x110/0x1b0
blk_update_request+0x9d/0x5a0
scsi_end_request+0x4c/0x300 [scsi_mod]
scsi_io_completion+0x71e/0xa40 [scsi_mod]
__blk_mq_complete_request+0x13e/0x220
srp_recv_done+0x454/0x1100 [ib_srp]
__ib_process_cq+0x9a/0xf0 [ib_core]
ib_poll_handler+0x2d/0x90 [ib_core]
irq_poll_softirq+0xe5/0x1e0
__do_softirq+0x112/0x5f0
run_ksoftirqd+0x29/0x50
smpboot_thread_fn+0x30f/0x410
kthread+0x1b2/0x1d0
ret_from_fork+0x24/0x30

This reverts commit 0ba99ca4838bc75481a4bf0e70bad20b0a5457c7.

Fixes: commit 45db54d58de0 ("block: Split out bio_list_copy_data()")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
---
 block/bio.c      | 3 ---
 block/blk-core.c | 8 +-------
 2 files changed, 1 insertion(+), 10 deletions(-)

Comments

Kent Overstreet May 23, 2018, 1:30 a.m. UTC | #1
On Tue, May 22, 2018 at 04:55:05PM -0700, Bart Van Assche wrote:
> This patch avoids that KASAN reports the following complaint when
> running the srp-test software:

I made a real attempt to get your test suite working. It would be nice if you
could make an attempt at tracking it down instead of just reverting, and holding
up improvements to the whole block layer, since you seem to have an easy repro
for it.

This is a real looming issue - we have other code which assumes things about
bi_next and this isn't a practical thing to grep/audit for.
Bart Van Assche May 23, 2018, 1:36 a.m. UTC | #2
On Tue, 2018-05-22 at 21:30 -0400, Kent Overstreet wrote:
> On Tue, May 22, 2018 at 04:55:05PM -0700, Bart Van Assche wrote:

> > This patch avoids that KASAN reports the following complaint when

> > running the srp-test software:

> 

> I made a real attempt to get your test suite working.


Please try a little harder. Everyone I asked to run that test suite
has been able to run it.

Bart.
Kent Overstreet May 23, 2018, 2:16 a.m. UTC | #3
On Wed, May 23, 2018 at 01:36:02AM +0000, Bart Van Assche wrote:
> On Tue, 2018-05-22 at 21:30 -0400, Kent Overstreet wrote:
> > On Tue, May 22, 2018 at 04:55:05PM -0700, Bart Van Assche wrote:
> > > This patch avoids that KASAN reports the following complaint when
> > > running the srp-test software:
> > 
> > I made a real attempt to get your test suite working.
> 
> Please try a little harder. Everyone I asked to run that test suite
> has been able to run it.
> 
> Bart.
> 
> 
> 

========= TEST   srp

systemd-journald[143]: Received SIGTERM from PID 1 (systemd).
Removed /etc/systemd/system/multipathd.service.
Starting multipath-tools (via systemctl): multipath-tools.service.
multipathd> reconfigure
ok
multipathd> make -C discontiguous-io discontiguous-io
make[1]: Entering directory
'/host/home/kent/ktest/tests/srp-test/discontiguous-io'
make[1]: 'discontiguous-io' is up to date.
make[1]: Leaving directory
'/host/home/kent/ktest/tests/srp-test/discontiguous-io'
Unloaded the ib_srpt kernel module
Unloaded the rdma_rxe kernel module
../run_tests: line 65: cd: /lib/modules/4.17.0-rc4+/kernel/block: No such file
or directory
Zero-initializing /dev/ram0 ... done
Zero-initializing /dev/ram1 ... done
Configured SRP target driver
Running test ../tests/01 ...
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/01 failed
Running test ../tests/02-mq ...
Test file I/O on top of multipath concurrently with logout and login (10 min;
mq)
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/02-mq failed
Running test ../tests/02-sq ...
Test file I/O on top of multipath concurrently with logout and login (10 min;
sq)
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/02-sq failed
Running test ../tests/02-sq-on-mq ...
Test file I/O on top of multipath concurrently with logout and login (10 min;
sq-on-mq)
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/02-sq-on-mq failed
Running test ../tests/03-4M ...
Test direct I/O with large transfer sizes and cmd_sg_entries=255
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/03-4M failed
Running test ../tests/03-8M ...
Test direct I/O with large transfer sizes and cmd_sg_entries=255
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/03-8M failed
Running test ../tests/04-4M ...
Test direct I/O with large transfer sizes and cmd_sg_entries=1
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/04-4M failed
Running test ../tests/04-8M ...
Test direct I/O with large transfer sizes and cmd_sg_entries=1
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/04-8M failed
Running test ../tests/05-4M ...
Test buffered I/O with large transfer sizes and cmd_sg_entries=255
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/05-4M failed
Running test ../tests/05-8M ...
Test buffered I/O with large transfer sizes and cmd_sg_entries=255
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/05-8M failed
Running test ../tests/06 ...
Test block I/O on top of multipath concurrently with logout and login (10 min)
Unloaded the ib_srp kernel module
SRP login failed
Test ../tests/06 failed
0 tests succeeded and 11 tests failed
Unloaded the ib_srpt kernel module
Unloaded the rdma_rxe kernel module

========= PASSED srp in 14s


May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: --------start up--------
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: read /etc/multipath.conf
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: path checkers start up
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sda: fail to get serial
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sda: using deprecated
getuid callout
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]:
/.../srp-test/bin/getuid_callout exited with 255
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sda: failed to get
callout uid: Input/output error
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sdb: fail to get serial
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sdb: using deprecated
getuid callout
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]:
/.../srp-test/bin/getuid_callout exited with 255
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sdb: failed to get
callout uid: Input/output error
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sdc: using deprecated
getuid callout
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]:
/.../srp-test/bin/getuid_callout exited with 255
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sdc: failed to get
callout uid: Input/output error
May 22 22:13:46 moria-kvm-kvm-kvm-kvm systemd[1]: Started Device-Mapper
Multipath Device Controller.
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: reconfigure (operator)
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sda: fail to get serial
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sda: using deprecated
getuid callout
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]:
/.../srp-test/bin/getuid_callout exited with 255
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sda: failed to get
callout uid: Input/output error
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sdb: fail to get serial
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sdb: using deprecated
getuid callout
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]:
/.../srp-test/bin/getuid_callout exited with 255
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sdb: failed to get
callout uid: Input/output error
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sdc: using deprecated
getuid callout
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]:
/.../srp-test/bin/getuid_callout exited with 255
May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sdc: failed to get
callout uid: Input/output error
Bart Van Assche June 4, 2018, 8:59 a.m. UTC | #4
On Tue, 2018-05-22 at 22:16 -0400, Kent Overstreet wrote:
> May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sda: fail to get serial

> May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sda: using deprecated

> getuid callout

> May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]:

> /.../srp-test/bin/getuid_callout exited with 255


(back from traveling)

Hello Kent,

Please comment out the getuid_callout line from /etc/multipath.conf and try again.
Changing the "/..." part into the path of the srp-test software should also help.
That line is a left-over from the time when the NVMe initiator drivers supported
dm-multipath. I will update the srp-test README.md to make this more clear.

Thanks,

Bart.
Kent Overstreet June 4, 2018, 10:41 p.m. UTC | #5
On Mon, Jun 04, 2018 at 08:59:39AM +0000, Bart Van Assche wrote:
> On Tue, 2018-05-22 at 22:16 -0400, Kent Overstreet wrote:
> > May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sda: fail to get serial
> > May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]: sda: using deprecated
> > getuid callout
> > May 22 22:13:46 moria-kvm-kvm-kvm-kvm multipathd[387]:
> > /.../srp-test/bin/getuid_callout exited with 255
> 
> (back from traveling)
> 
> Hello Kent,
> 
> Please comment out the getuid_callout line from /etc/multipath.conf and try again.
> Changing the "/..." part into the path of the srp-test software should also help.
> That line is a left-over from the time when the NVMe initiator drivers supported
> dm-multipath. I will update the srp-test README.md to make this more clear.

Deleted that line and it still doesn't work (I also tried fixing that line so
the path was correct, but got the same output as what I sent you earlier). With
that line deleted, the output is different. test output still says SRP login
failed, syslog says:

Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: --------start up--------
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: read /etc/multipath.conf
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: path checkers start up
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: sda: fail to get serial
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: sdb: fail to get serial
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd0: HDIO_GETGEO failed with 25
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd0: failed to get udev uid: Invalid argument
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd1: HDIO_GETGEO failed with 25
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd1: failed to get udev uid: Invalid argument
Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd10: HDIO_GETGEO failed with 25

etc.
Bart Van Assche June 5, 2018, 2:38 p.m. UTC | #6
On Mon, 2018-06-04 at 18:41 -0400, Kent Overstreet wrote:
> On Mon, Jun 04, 2018 at 08:59:39AM +0000, Bart Van Assche wrote:

> > Please comment out the getuid_callout line from /etc/multipath.conf and try again.

> > Changing the "/..." part into the path of the srp-test software should also help.

> > That line is a left-over from the time when the NVMe initiator drivers supported

> > dm-multipath. I will update the srp-test README.md to make this more clear.

> 

> Deleted that line and it still doesn't work (I also tried fixing that line so

> the path was correct, but got the same output as what I sent you earlier). With

> that line deleted, the output is different. test output still says SRP login

> failed, syslog says:

> 

> Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: --------start up--------

> Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: read /etc/multipath.conf

> Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: path checkers start up

> Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: sda: fail to get serial

> Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: sdb: fail to get serial

> Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd0: HDIO_GETGEO failed with 25

> Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd0: failed to get udev uid: Invalid argument

> Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd1: HDIO_GETGEO failed with 25

> Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd1: failed to get udev uid: Invalid argument

> Jun 04 18:35:31 moria-kvm-kvm-kvm-kvm-kvm multipathd[453]: nbd10: HDIO_GETGEO failed with 25


Thanks for having shared this output. In the multipathd source code I found that
the error message "fail to get serial" is produced by the following code:

	if (get_vpd_sysfs(parent, 0x80, pp->serial, SERIAL_SIZE) <= 0) {
		if (get_serial(pp->serial, SERIAL_SIZE, pp->fd)) {
			condlog(2, "%s: fail to get serial", pp->dev);
			return 0;
		}
	}

The message "fail to get serial" is only logged if reading
/sys/class/scsi_device/*/*/*/*/vpd_pg80 fails *and* submitting a SCSI INQUIRY
command through the SG_IO interface fails. This is something I haven't encountered
before. Is there something special about the VM? Was e.g. multipathd configured
to run as another user than root? Have some security settings been configured that
prevent multipathd from accessing sysfs attributes? Was SCSI generic support
disabled (CHR_DEV_SG)?

Thanks,

Bart.
diff mbox

Patch

diff --git a/block/bio.c b/block/bio.c
index 0a4df92cd689..e22ebab450f8 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1777,9 +1777,6 @@  void bio_endio(struct bio *bio)
 	if (!bio_integrity_endio(bio))
 		return;
 
-	if (WARN_ONCE(bio->bi_next, "driver left bi_next not NULL"))
-		bio->bi_next = NULL;
-
 	/*
 	 * Need to have a real endio function for chained bios, otherwise
 	 * various corner cases will break (like stacking block devices that
diff --git a/block/blk-core.c b/block/blk-core.c
index a216b8b137f4..4d69f91a6431 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -277,10 +277,6 @@  static void req_bio_endio(struct request *rq, struct bio *bio,
 	bio_advance(bio, nbytes);
 
 	/* don't actually finish bio if it's part of flush sequence */
-	/*
-	 * XXX this code looks suspicious - it's not consistent with advancing
-	 * req->bio in caller
-	 */
 	if (bio->bi_iter.bi_size == 0 && !(rq->rq_flags & RQF_FLUSH_SEQ))
 		bio_endio(bio);
 }
@@ -3115,10 +3111,8 @@  bool blk_update_request(struct request *req, blk_status_t error,
 		struct bio *bio = req->bio;
 		unsigned bio_bytes = min(bio->bi_iter.bi_size, nr_bytes);
 
-		if (bio_bytes == bio->bi_iter.bi_size) {
+		if (bio_bytes == bio->bi_iter.bi_size)
 			req->bio = bio->bi_next;
-			bio->bi_next = NULL;
-		}
 
 		/* Completion has already been traced */
 		bio_clear_flag(bio, BIO_TRACE_COMPLETION);