diff mbox

bio: ensure __bio_clone_fast copies bi_partno

Message ID 20171117074725.22536-1-mlyle@lyle.org (mailing list archive)
State New, archived
Headers show

Commit Message

Michael Lyle Nov. 17, 2017, 7:47 a.m. UTC
A new field was introduced in 74d46992e0d9dee7f1f376de0d56d31614c8a17a,
bi_partno, instead of using bdev->bd_contains and encoding the partition
information in the bi_bdev field.  __bio_clone_fast was changed to copy
the disk information, but not the partition information.  At minimum,
this regressed bcache and caused data corruption.

Signed-off-by: Michael Lyle <mlyle@lyle.org>
Fixes: 74d46992e0d9dee7f1f376de0d56d31614c8a17a
Reported-by: Pavel Goran <via-bcache@pvgoran.name>
Reported-by: Campbell Steven <casteven@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
---
 block/bio.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Coly Li Nov. 17, 2017, 11:04 a.m. UTC | #1
On 17/11/2017 3:47 PM, Michael Lyle wrote:
> A new field was introduced in 74d46992e0d9dee7f1f376de0d56d31614c8a17a,
> bi_partno, instead of using bdev->bd_contains and encoding the partition
> information in the bi_bdev field.  __bio_clone_fast was changed to copy
> the disk information, but not the partition information.  At minimum,
> this regressed bcache and caused data corruption.
> 

Hi Michael,

Thanks for the fix, it looks good to me.

> Signed-off-by: Michael Lyle <mlyle@lyle.org>
> Fixes: 74d46992e0d9dee7f1f376de0d56d31614c8a17a
> Reported-by: Pavel Goran <via-bcache@pvgoran.name>
> Reported-by: Campbell Steven <casteven@gmail.com>

Reviewed-by: Coly Li <colyli@suse.de>

Coly Li

> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: <stable@vger.kernel.org>
> ---
>  block/bio.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 101c2a9b5481..33fa6b4af312 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -597,6 +597,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
>  	 * so we don't set nor calculate new physical/hw segment counts here
>  	 */
>  	bio->bi_disk = bio_src->bi_disk;
> +	bio->bi_partno = bio_src->bi_partno;
>  	bio_set_flag(bio, BIO_CLONED);
>  	bio->bi_opf = bio_src->bi_opf;
>  	bio->bi_write_hint = bio_src->bi_write_hint;
>
Ming Lei Nov. 17, 2017, 11:34 a.m. UTC | #2
On Thu, Nov 16, 2017 at 11:47:25PM -0800, Michael Lyle wrote:
> A new field was introduced in 74d46992e0d9dee7f1f376de0d56d31614c8a17a,
> bi_partno, instead of using bdev->bd_contains and encoding the partition
> information in the bi_bdev field.  __bio_clone_fast was changed to copy
> the disk information, but not the partition information.  At minimum,
> this regressed bcache and caused data corruption.
> 
> Signed-off-by: Michael Lyle <mlyle@lyle.org>
> Fixes: 74d46992e0d9dee7f1f376de0d56d31614c8a17a
> Reported-by: Pavel Goran <via-bcache@pvgoran.name>
> Reported-by: Campbell Steven <casteven@gmail.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Jens Axboe <axboe@kernel.dk>
> Cc: <stable@vger.kernel.org>
> ---
>  block/bio.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 101c2a9b5481..33fa6b4af312 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -597,6 +597,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
>  	 * so we don't set nor calculate new physical/hw segment counts here
>  	 */
>  	bio->bi_disk = bio_src->bi_disk;
> +	bio->bi_partno = bio_src->bi_partno;
>  	bio_set_flag(bio, BIO_CLONED);
>  	bio->bi_opf = bio_src->bi_opf;
>  	bio->bi_write_hint = bio_src->bi_write_hint;

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Christoph Hellwig Nov. 17, 2017, 3:08 p.m. UTC | #3
Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>
Jens Axboe Nov. 17, 2017, 3:27 p.m. UTC | #4
On 11/17/2017 12:47 AM, Michael Lyle wrote:
> A new field was introduced in 74d46992e0d9dee7f1f376de0d56d31614c8a17a,
> bi_partno, instead of using bdev->bd_contains and encoding the partition
> information in the bi_bdev field.  __bio_clone_fast was changed to copy
> the disk information, but not the partition information.  At minimum,
> this regressed bcache and caused data corruption.

That's not good... Fix looks good to me, I'll queue this up for a pull
today. Thanks for bisecting this, Michael.
Bart Van Assche Nov. 17, 2017, 4:50 p.m. UTC | #5
On Thu, 2017-11-16 at 23:47 -0800, Michael Lyle wrote:
> diff --git a/block/bio.c b/block/bio.c

> index 101c2a9b5481..33fa6b4af312 100644

> --- a/block/bio.c

> +++ b/block/bio.c

> @@ -597,6 +597,7 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)

>  	 * so we don't set nor calculate new physical/hw segment counts here

>  	 */

>  	bio->bi_disk = bio_src->bi_disk;

> +	bio->bi_partno = bio_src->bi_partno;

>  	bio_set_flag(bio, BIO_CLONED);

>  	bio->bi_opf = bio_src->bi_opf;

>  	bio->bi_write_hint = bio_src->bi_write_hint;


Have you considered to use bio_copy_dev() instead of open-coding it?

Additionally, there is more code that copies these fields, e.g. the code in
bio_clone_bioset(). Shouldn't that code be modified such that it also copies
bi_partno?

How about the following class of assignments in drivers/md/raid1.c:

		mbio->bi_disk = (void *)conf->mirrors[i].rdev;

Should these assignments perhaps be followed by a mbio->bi_partno assignment?

How about the following class of assignments in the NVMe code:

               bio->bi_disk = disk;

Should these assignments perhaps be followed by a bio->bi_partno assignment?

Thanks,

Bart.
Michael Lyle Nov. 17, 2017, 5:02 p.m. UTC | #6
Jens & everyone-- thanks for the speedy review and handling.  I've
updated my test cases to ensure that volumes from old releases work,
even when I "don't think" there's been a disk format change.

Bart--

On 11/17/2017 08:50 AM, Bart Van Assche wrote:
> Have you considered to use bio_copy_dev() instead of open-coding it?

One could... Right now almost all the uses of bio_copy_dev are in bcache
and they need to change for other reasons.  (e.g. macro uses parameter
more than once, function is passed in as parameter).  There's a whole
lot of places to change if it's desired to make bio_copy_dev universally
used to copy device information.

> Additionally, there is more code that copies these fields, e.g. the code in
> bio_clone_bioset(). Shouldn't that code be modified such that it also copies
> bi_partno?

Yes, when I was grepping around there were other things that looked
possibly unsafe.  I don't have test environments for all of these other
subsystems.

I wanted to get the minimal fix for this in, though, because people are
actively losing data to the problem it triggers with bcache.

Mike
Christoph Hellwig Nov. 17, 2017, 5:18 p.m. UTC | #7
On Fri, Nov 17, 2017 at 04:50:39PM +0000, Bart Van Assche wrote:
> 
> How about the following class of assignments in drivers/md/raid1.c:
> 
> 		mbio->bi_disk = (void *)conf->mirrors[i].rdev;
> 
> Should these assignments perhaps be followed by a mbio->bi_partno assignment?

No.  They assign a struct md_rdev to the bi_disk pointer, abusing it
for internal storage.  They should not assign mbio->bi_partno.
Instead we should figure out a way to get rid of this.

> 
> How about the following class of assignments in the NVMe code:
> 
>                bio->bi_disk = disk;
> 
> Should these assignments perhaps be followed by a bio->bi_partno assignment?

No, the multipath code always works on whole namespaces, not partitions.
Campbell Steven Nov. 17, 2017, 8:25 p.m. UTC | #8
On 17 November 2017 at 20:47, Michael Lyle <mlyle@lyle.org> wrote:
> A new field was introduced in 74d46992e0d9dee7f1f376de0d56d31614c8a17a,
> bi_partno, instead of using bdev->bd_contains and encoding the partition
> information in the bi_bdev field.  __bio_clone_fast was changed to copy
> the disk information, but not the partition information.  At minimum,
> this regressed bcache and caused data corruption.

Thanks alot for this Michael, I will run it up on our test servers today.

Campbell
diff mbox

Patch

diff --git a/block/bio.c b/block/bio.c
index 101c2a9b5481..33fa6b4af312 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -597,6 +597,7 @@  void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
 	 * so we don't set nor calculate new physical/hw segment counts here
 	 */
 	bio->bi_disk = bio_src->bi_disk;
+	bio->bi_partno = bio_src->bi_partno;
 	bio_set_flag(bio, BIO_CLONED);
 	bio->bi_opf = bio_src->bi_opf;
 	bio->bi_write_hint = bio_src->bi_write_hint;