block: bugfix for Amiga partition overflow check patch

Message ID	20230701023524.7434-1-schmitzmic@gmail.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@vger.kernel.org> From: Michael Schmitz <schmitzmic@gmail.com> To: linux-block@vger.kernel.org, axboe@kernel.dk Cc: linux-m68k@vger.kernel.org, chzigotzky@xenosoft.de, geert@linux-m68k.org, hch@lst.de, martin@lichtvoll.de, Michael Schmitz <schmitzmic@gmail.com>, stable@vger.kernel.org Subject: [PATCH] block: bugfix for Amiga partition overflow check patch Date: Sat, 1 Jul 2023 14:35:24 +1200 Message-Id: <20230701023524.7434-1-schmitzmic@gmail.com> Precedence: bulk
Series	block: bugfix for Amiga partition overflow check patch \| expand block: bugfix for Amiga partition overflow check patch

Michael Schmitz July 1, 2023, 2:35 a.m. UTC

Making 'blk' sector_t (i.e. 64 bit if LBD support is active)
fails the 'blk>0' test in the partition block loop if a
value of (signed int) -1 is used to mark the end of the
partition block list.

This bug was introduced in patch 3 of my prior Amiga partition
support fixes series, and spotted by Christian Zigotzky when
testing the latest block updates.

Explicitly cast 'blk' to signed int to allow use of -1 to
terminate the partition block linked list.

Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Fixes: b6f3f28f60 ("Linux 6.4")
Message-ID: 024ce4fa-cc6d-50a2-9aae-3701d0ebf668@xenosoft.de
Cc: <stable@vger.kernel.org> # 6.4
Link: https://lore.kernel.org/r/024ce4fa-cc6d-50a2-9aae-3701d0ebf668@xenosoft.de
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
---
 block/partitions/amiga.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Christian Zigotzky July 1, 2023, 6:40 a.m. UTC | #1

On 01 July 2023 at 04:35 am, Michael Schmitz wrote:
> Making 'blk' sector_t (i.e. 64 bit if LBD support is active)
> fails the 'blk>0' test in the partition block loop if a
> value of (signed int) -1 is used to mark the end of the
> partition block list.
>
> This bug was introduced in patch 3 of my prior Amiga partition
> support fixes series, and spotted by Christian Zigotzky when
> testing the latest block updates.
>
> Explicitly cast 'blk' to signed int to allow use of -1 to
> terminate the partition block linked list.
>
> Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
> Fixes: b6f3f28f60 ("Linux 6.4")
> Message-ID: 024ce4fa-cc6d-50a2-9aae-3701d0ebf668@xenosoft.de
> Cc: <stable@vger.kernel.org> # 6.4
> Link: https://lore.kernel.org/r/024ce4fa-cc6d-50a2-9aae-3701d0ebf668@xenosoft.de
> Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
> ---
>   block/partitions/amiga.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/partitions/amiga.c b/block/partitions/amiga.c
> index ed222b9c901b..506921095412 100644
> --- a/block/partitions/amiga.c
> +++ b/block/partitions/amiga.c
> @@ -90,7 +90,7 @@ int amiga_partition(struct parsed_partitions *state)
>   	}
>   	blk = be32_to_cpu(rdb->rdb_PartitionList);
>   	put_dev_sector(sect);
> -	for (part = 1; blk>0 && part<=16; part++, put_dev_sector(sect)) {
> +	for (part = 1; (s32) blk>0 && part<=16; part++, put_dev_sector(sect)) {
>   		/* Read in terms partition table understands */
>   		if (check_mul_overflow(blk, (sector_t) blksize, &blk)) {
>   			pr_err("Dev %s: overflow calculating partition block %llu! Skipping partitions %u and beyond\n",
Hello Michael,

Thanks for your patch.

I patched the latest git kernel source code with your patch today but 
unfortunately the kernel has reported a bad geometry. (EXT4-fs (sda4): 
bad geometry: block count ...)

dmesg | grep -i sda

[    4.025449] sd 0:0:0:0: [sda] 3907029168 512-byte logical blocks: 
(2.00 TB/1.82 TiB)
[    4.071978] sd 0:0:0:0: [sda] 4096-byte physical blocks
[    4.119333] sd 0:0:0:0: [sda] Write Protect is off
[    4.165958] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    4.212921] sd 0:0:0:0: [sda] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA
[    4.259469] sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes
[    4.502519]  sda: RDSK (512) sda1 (DOS^G)(res 2 spb 2) sda2 
(SFS^B)(res 2 spb 1) sda3 (SFS^B)(res 2 spb 2) sda4 ((res 2 spb 1)
[    4.551981] sd 0:0:0:0: [sda] Attached SCSI disk
[   82.421727] EXT4-fs (sda4): bad geometry: block count 319655862 
exceeds size of device (317690430 blocks)

I can't mount the ext4 partition on the RDB disk and booting isn't 
possible as well.

Thanks,
Christian

Michael Schmitz July 1, 2023, 8:11 a.m. UTC | #2

Hi Christian,

Am 01.07.2023 um 18:40 schrieb Christian Zigotzky:
> On 01 July 2023 at 04:35 am, Michael Schmitz wrote:
>> Making 'blk' sector_t (i.e. 64 bit if LBD support is active)
>> fails the 'blk>0' test in the partition block loop if a
>> value of (signed int) -1 is used to mark the end of the
>> partition block list.
>>
>> This bug was introduced in patch 3 of my prior Amiga partition
>> support fixes series, and spotted by Christian Zigotzky when
>> testing the latest block updates.
>>
>> Explicitly cast 'blk' to signed int to allow use of -1 to
>> terminate the partition block linked list.
>>
>> Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
>> Fixes: b6f3f28f60 ("Linux 6.4")
>> Message-ID: 024ce4fa-cc6d-50a2-9aae-3701d0ebf668@xenosoft.de
>> Cc: <stable@vger.kernel.org> # 6.4
>> Link:
>> https://lore.kernel.org/r/024ce4fa-cc6d-50a2-9aae-3701d0ebf668@xenosoft.de
>>
>> Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
>> ---
>>   block/partitions/amiga.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/block/partitions/amiga.c b/block/partitions/amiga.c
>> index ed222b9c901b..506921095412 100644
>> --- a/block/partitions/amiga.c
>> +++ b/block/partitions/amiga.c
>> @@ -90,7 +90,7 @@ int amiga_partition(struct parsed_partitions *state)
>>       }
>>       blk = be32_to_cpu(rdb->rdb_PartitionList);
>>       put_dev_sector(sect);
>> -    for (part = 1; blk>0 && part<=16; part++, put_dev_sector(sect)) {
>> +    for (part = 1; (s32) blk>0 && part<=16; part++,
>> put_dev_sector(sect)) {
>>           /* Read in terms partition table understands */
>>           if (check_mul_overflow(blk, (sector_t) blksize, &blk)) {
>>               pr_err("Dev %s: overflow calculating partition block
>> %llu! Skipping partitions %u and beyond\n",
> Hello Michael,
>
> Thanks for your patch.
>
> I patched the latest git kernel source code with your patch today but
> unfortunately the kernel has reported a bad geometry. (EXT4-fs (sda4):
> bad geometry: block count ...)
>
> dmesg | grep -i sda
>
> [    4.025449] sd 0:0:0:0: [sda] 3907029168 512-byte logical blocks:
> (2.00 TB/1.82 TiB)
> [    4.071978] sd 0:0:0:0: [sda] 4096-byte physical blocks
> [    4.119333] sd 0:0:0:0: [sda] Write Protect is off
> [    4.165958] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [    4.212921] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
> [    4.259469] sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes
> [    4.502519]  sda: RDSK (512) sda1 (DOS^G)(res 2 spb 2) sda2
> (SFS^B)(res 2 spb 1) sda3 (SFS^B)(res 2 spb 2) sda4 ((res 2 spb 1)

Good - all partitions are recognized again as they ought to be.

> [    4.551981] sd 0:0:0:0: [sda] Attached SCSI disk
> [   82.421727] EXT4-fs (sda4): bad geometry: block count 319655862
> exceeds size of device (317690430 blocks)

Now that may be a new bug... or just a filesystem created with incorrect 
assumptions about partition size.

That is the partition that had reported:

> sda: p4 size 18446744071956107760 extends beyond EOD, truncated

with my patches backed out? I wonder what size mkfs used when creating 
the filesystem?

The calculated size was clearly incorrect, but the truncated size may 
also be incorrect. The truncated size is likely that of a partition 
extending to the end of the disk, but your actual p4 size may have been 
smaller (your parted -l output had 1992GB which is 8 GB shorter than to 
EOD).

On the face of it, this does not look like a new bug in the RDB 
partition code, but I'd rather verify that by inspecting the RDB 
contents and carrying out the calculations by hand.

Can you please send a copy of the RDB (first few kB of the disk, 
something like dd if=/dev/sda of=rdb-sda.img bs=512 count=16 should do), 
and the output of cat /proc/partitions when running a kernel from before 
my RDB patches?

> I can't mount the ext4 partition on the RDB disk and booting isn't
> possible as well.

I suppose the ext4 filesystem must be resized to match the actual 
partition size. I don't think that can be done on a live, mounted 
filesystem. You may have to hook up the disk to another Linux host for 
that, and use resize2fs there (or boot a ramdisk containing that tool).

Cheers,

	Michael

>
> Thanks,
> Christian

Christian Zigotzky July 1, 2023, 9:48 a.m. UTC | #3

On 01 July 2023 at 10:11 am, Michael Schmitz wrote:
> Hi Christian,
>
> Am 01.07.2023 um 18:40 schrieb Christian Zigotzky:
>> On 01 July 2023 at 04:35 am, Michael Schmitz wrote:
>>> Making 'blk' sector_t (i.e. 64 bit if LBD support is active)
>>> fails the 'blk>0' test in the partition block loop if a
>>> value of (signed int) -1 is used to mark the end of the
>>> partition block list.
>>>
>>> This bug was introduced in patch 3 of my prior Amiga partition
>>> support fixes series, and spotted by Christian Zigotzky when
>>> testing the latest block updates.
>>>
>>> Explicitly cast 'blk' to signed int to allow use of -1 to
>>> terminate the partition block linked list.
>>>
>>> Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
>>> Fixes: b6f3f28f60 ("Linux 6.4")
>>> Message-ID: 024ce4fa-cc6d-50a2-9aae-3701d0ebf668@xenosoft.de
>>> Cc: <stable@vger.kernel.org> # 6.4
>>> Link:
>>> https://lore.kernel.org/r/024ce4fa-cc6d-50a2-9aae-3701d0ebf668@xenosoft.de 
>>>
>>>
>>> Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
>>> ---
>>>   block/partitions/amiga.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/block/partitions/amiga.c b/block/partitions/amiga.c
>>> index ed222b9c901b..506921095412 100644
>>> --- a/block/partitions/amiga.c
>>> +++ b/block/partitions/amiga.c
>>> @@ -90,7 +90,7 @@ int amiga_partition(struct parsed_partitions *state)
>>>       }
>>>       blk = be32_to_cpu(rdb->rdb_PartitionList);
>>>       put_dev_sector(sect);
>>> -    for (part = 1; blk>0 && part<=16; part++, put_dev_sector(sect)) {
>>> +    for (part = 1; (s32) blk>0 && part<=16; part++,
>>> put_dev_sector(sect)) {
>>>           /* Read in terms partition table understands */
>>>           if (check_mul_overflow(blk, (sector_t) blksize, &blk)) {
>>>               pr_err("Dev %s: overflow calculating partition block
>>> %llu! Skipping partitions %u and beyond\n",
>> Hello Michael,
>>
>> Thanks for your patch.
>>
>> I patched the latest git kernel source code with your patch today but
>> unfortunately the kernel has reported a bad geometry. (EXT4-fs (sda4):
>> bad geometry: block count ...)
>>
>> dmesg | grep -i sda
>>
>> [    4.025449] sd 0:0:0:0: [sda] 3907029168 512-byte logical blocks:
>> (2.00 TB/1.82 TiB)
>> [    4.071978] sd 0:0:0:0: [sda] 4096-byte physical blocks
>> [    4.119333] sd 0:0:0:0: [sda] Write Protect is off
>> [    4.165958] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
>> [    4.212921] sd 0:0:0:0: [sda] Write cache: enabled, read cache:
>> enabled, doesn't support DPO or FUA
>> [    4.259469] sd 0:0:0:0: [sda] Preferred minimum I/O size 4096 bytes
>> [    4.502519]  sda: RDSK (512) sda1 (DOS^G)(res 2 spb 2) sda2
>> (SFS^B)(res 2 spb 1) sda3 (SFS^B)(res 2 spb 2) sda4 ((res 2 spb 1)
>
> Good - all partitions are recognized again as they ought to be.
>
>> [    4.551981] sd 0:0:0:0: [sda] Attached SCSI disk
>> [   82.421727] EXT4-fs (sda4): bad geometry: block count 319655862
>> exceeds size of device (317690430 blocks)
>
> Now that may be a new bug... or just a filesystem created with 
> incorrect assumptions about partition size.
>
> That is the partition that had reported:
>
>> sda: p4 size 18446744071956107760 extends beyond EOD, truncated
>
> with my patches backed out? I wonder what size mkfs used when creating 
> the filesystem?
>
> The calculated size was clearly incorrect, but the truncated size may 
> also be incorrect. The truncated size is likely that of a partition 
> extending to the end of the disk, but your actual p4 size may have 
> been smaller (your parted -l output had 1992GB which is 8 GB shorter 
> than to EOD).
>
> On the face of it, this does not look like a new bug in the RDB 
> partition code, but I'd rather verify that by inspecting the RDB 
> contents and carrying out the calculations by hand.
>
> Can you please send a copy of the RDB (first few kB of the disk, 
> something like dd if=/dev/sda of=rdb-sda.img bs=512 count=16 should 
> do), and the output of cat /proc/partitions when running a kernel from 
> before my RDB patches?
>
>
Copy of the RDB: https://www.xenosoft.de/rdb-sda.img

cat /proc/partitions:

major minor  #blocks  name

    8        0 1953514584 sda
    8        1     119088 sda1
    8        2    2100096 sda2
    8        3  672670920 sda3
    8        4 1278623448 sda4
   11        0    1048575 sr0
    8       32     250880 sdc
    8       33     249856 sdc1
    8       16  234431064 sdb
    8       17  144364512 sdb1
    8       18          1 sdb2
    8       19   18500608 sdb3
    8       20   40717312 sdb4
    8       21   14684160 sdb5
    8       22   16161792 sdb6
    8       48       1440 sdd
    8       49       1439 sdd1

Thanks

Michael Schmitz July 2, 2023, 2:17 a.m. UTC | #4

Hi Christian,

Am 01.07.2023 um 21:48 schrieb Christian Zigotzky:
>> Can you please send a copy of the RDB (first few kB of the disk,
>> something like dd if=/dev/sda of=rdb-sda.img bs=512 count=16 should
>> do), and the output of cat /proc/partitions when running a kernel from
>> before my RDB patches?
>>
>>
> Copy of the RDB: https://www.xenosoft.de/rdb-sda.img

Thanks, casual inspection of this RDB does show that indeed a value of 
-1 is used as pb_next in partition slot 4 (and 5).

The disk geometry is defined as 3 heads, 16 sectors per track (48 
sectors per cylinder) and 81396441 cylinders which matches your 2 TB 
disk size.

The first partition begins at cylinder 43 and ends at cylinder 5004, 
matching the 119088 k (k == 1024 bytes) below.

Partition 2 begins at cylinder 5005, ends at 92508, size 2100096 again 
as below.

Partition 3 begins at cylinder 92509, ends at 28120463, size 672670920 
as below.

Partition 4 begins at cylinder 28120464, ends at 81068868, size 
1270041720, different from the size shown in your /proc/partitions log.

The disk ends at cylinder 81396440, so a partition 4 extending to the 
end of the disk would have size 1278623448, which is what your log shows.

>
> cat /proc/partitions:
>
> major minor  #blocks  name
>
>    8        0 1953514584 sda
>    8        1     119088 sda1
>    8        2    2100096 sda2
>    8        3  672670920 sda3
>    8        4 1278623448 sda4

I have (disk image on sdb, patches applied):
      8       20 1270761720 sdb4

which matches what I calculated by hand above.

With an old kernel that does not have the RDB fixes, I get the same 
partition size as you report. That size is the result of truncation to 
EOD (the miscalculated size of 18446744071956107760 exceeds the device 
size).

Creating the filesystem on an unpatched kernel will use that incorrect 
partition size. I'm sorry to say I cannot see a new RDB partition bug 
her, just the result of undefined behaviour due to overflowing a 32 bit 
nr_sect size calculation in the old RDB code.


If you cannot shrink the filesystem, you will have to edit the partition 
table to extend p4 to the end of the disk. Just replace the partition 4 
pb->pb_Environment[10] (at offset 0x8a8, current value 0x04d50344) by 
0x04da02d8. As far as I can see, there is no adjustment to the partition 
block checksum required, as the checksummed block of 160 bytes ends just 
before the location of the partition's low and high cylinder addresses....

I'd best verify that a patched RDB actually works...

Cheers,

	Michael



>   11        0    1048575 sr0
>    8       32     250880 sdc
>    8       33     249856 sdc1
>    8       16  234431064 sdb
>    8       17  144364512 sdb1
>    8       18          1 sdb2
>    8       19   18500608 sdb3
>    8       20   40717312 sdb4
>    8       21   14684160 sdb5
>    8       22   16161792 sdb6
>    8       48       1440 sdd
>    8       49       1439 sdd1

Michael Schmitz July 2, 2023, 3:45 a.m. UTC | #5

Hi Christian,

Am 02.07.2023 um 14:17 schrieb Michael Schmitz:
> If you cannot shrink the filesystem, you will have to edit the partition
> table to extend p4 to the end of the disk. Just replace the partition 4
> pb->pb_Environment[10] (at offset 0x8a8, current value 0x04d50344) by
> 0x04da02d8. As far as I can see, there is no adjustment to the partition
> block checksum required, as the checksummed block of 160 bytes ends just
> before the location of the partition's low and high cylinder addresses....
>
> I'd best verify that a patched RDB actually works...

Argh - the checksum is over 160 longwords, not bytes.

Replace the checksum (at offset 0x808, currently 0x25d82b54) by 0x25d32bc0.

With these changes, I get the same size for partition 4 as you got by 
truncation with the original RDB partition code. That might be the 
fastest way to get your Linux partition mountable again.

Cheers,

	Michael


>
> Cheers,
>
>     Michael
>
>
>
>>   11        0    1048575 sr0
>>    8       32     250880 sdc
>>    8       33     249856 sdc1
>>    8       16  234431064 sdb
>>    8       17  144364512 sdb1
>>    8       18          1 sdb2
>>    8       19   18500608 sdb3
>>    8       20   40717312 sdb4
>>    8       21   14684160 sdb5
>>    8       22   16161792 sdb6
>>    8       48       1440 sdd
>>    8       49       1439 sdd1
>

Christian Zigotzky July 2, 2023, 4:37 a.m. UTC | #6

On 02 July 2023 at 04:17 am, Michael Schmitz wrote:
>  I'm sorry to say I cannot see a new RDB partition bug her, just the 
> result of undefined behaviour due to overflowing a 32 bit nr_sect size 
> calculation in the old RDB code.
>
Hello Michael,

Thanks a lot for your explanation!

Actually, we need your patch because there is a very old bug but there 
are some endusers with RDB disks with Linux partitions. If I apply your 
patch to our kernels, then I need an enduser friendly solution for 
fixing the issue with their file systems.

Do you have a solution for the enduser (consumer)? How can they fix 
their file systems? The next issue is, if an enduser uses an old 
unpatched kernel with a partition, created with a new patched kernel. I 
don't know how can I handle this issue in the consumer support.

I can't help all endusers and some are not active in forums etc.

Thanks,
Christian

Martin Steigerwald July 2, 2023, 7:55 a.m. UTC | #7

Christian Zigotzky - 02.07.23, 06:37:50 CEST:
> On 02 July 2023 at 04:17 am, Michael Schmitz wrote:
> >  I'm sorry to say I cannot see a new RDB partition bug her, just the
> > result of undefined behaviour due to overflowing a 32 bit nr_sect
> > size calculation in the old RDB code.
[…]
> Thanks a lot for your explanation!
> 
> Actually, we need your patch because there is a very old bug but there
> are some endusers with RDB disks with Linux partitions. If I apply
> your patch to our kernels, then I need an enduser friendly solution
> for fixing the issue with their file systems.

I have read through the last mails without commenting. I admit: I do not 
yet get what is wrong here? A checksum was miscalculated? Is this a 
regular thing to happen when using RDB disks with Linux partitions?

I do not yet get how this issue happens. What are the steps to reproduce 
it? And how likely is it to run into it?

> Do you have a solution for the enduser (consumer)? How can they fix
> their file systems? The next issue is, if an enduser uses an old
> unpatched kernel with a partition, created with a new patched kernel.
> I don't know how can I handle this issue in the consumer support.
> 
> I can't help all endusers and some are not active in forums etc.

How many end users are you speaking of?

Back then I thought I was the only one using a hard disk with mixed 
Amiga/Linux RDB setup.

Best,

Christian Zigotzky July 2, 2023, 8:56 a.m. UTC | #8

On 2. Jul 2023, at 09:55, Martin Steigerwald <martin@lichtvoll.de> wrote:

How many end users are you speaking of?

Back then I thought I was the only one using a hard disk with mixed 
Amiga/Linux RDB setup.

Best,

Christian Zigotzky July 2, 2023, 9:34 a.m. UTC | #9

On 2. Jul 2023, at 10:56, Christian Zigotzky <chzigotzky@xenosoft.de> wrote:

On 2. Jul 2023, at 09:55, Martin Steigerwald <martin@lichtvoll.de> wrote:

How many end users are you speaking of?

Back then I thought I was the only one using a hard disk with mixed 
Amiga/Linux RDB setup.

Best,

John Paul Adrian Glaubitz July 2, 2023, 9:51 a.m. UTC | #10

> On Jul 2, 2023, at 11:38 AM, Christian Zigotzky <chzigotzky@xenosoft.de> wrote:
> 
> The end users have to fix their  RDBs if they want to use the new patched kernels.
> 
> But a normal user can’t edit the RDB manually. What can we do for the end users?

I would suggest writing a small bash script to do that.

Adrian

Martin Steigerwald July 2, 2023, 10:34 a.m. UTC | #11

Hi!

Christian Zigotzky - 02.07.23, 11:34:27 CEST:
> A lot. :-) I am speaking about the new A-EON machines.

Ah yes, that is what the PASEMI is about, I did not recall initially. So 
far I decided to skip on these.

I agree with John that a small shell script could do the trick.

However I still do not understand the issue fully. So I have no idea 
what would be the best approach there.

Best,

Michael Schmitz July 2, 2023, 8:22 p.m. UTC | #12

Hi Martin

On 2/07/23 19:55, Martin Steigerwald wrote:
> Christian Zigotzky - 02.07.23, 06:37:50 CEST:
>> On 02 July 2023 at 04:17 am, Michael Schmitz wrote:
>>>   I'm sorry to say I cannot see a new RDB partition bug her, just the
>>> result of undefined behaviour due to overflowing a 32 bit nr_sect
>>> size calculation in the old RDB code.
> […]
>> Thanks a lot for your explanation!
>>
>> Actually, we need your patch because there is a very old bug but there
>> are some endusers with RDB disks with Linux partitions. If I apply
>> your patch to our kernels, then I need an enduser friendly solution
>> for fixing the issue with their file systems.
> I have read through the last mails without commenting. I admit: I do not
> yet get what is wrong here? A checksum was miscalculated? Is this a
> regular thing to happen when using RDB disks with Linux partitions?
I sent instructions to Christian on how to fix his partition table so 
the size mismatch between partition and filesystem (caused by the old 
RDB code) can be avoided, and misreading the checksum calculation code I 
forgot to update the checksum. That's all.

Cheers,

     Michael

Michael Schmitz July 3, 2023, 1:57 a.m. UTC | #13

Hi Christian,

please stop placing your replies underneath the previous mail's 
signature separator. There are mail clients that won't copy such text 
when composing a reply.

Am 02.07.2023 um 21:34 schrieb Christian Zigotzky:
>
>
> On 2. Jul 2023, at 10:56, Christian Zigotzky <chzigotzky@xenosoft.de> wrote:
>
> On 2. Jul 2023, at 09:55, Martin Steigerwald <martin@lichtvoll.de> wrote:
>
> How many end users are you speaking of?
>
> Back then I thought I was the only one using a hard disk with mixed
> Amiga/Linux RDB setup.
>
> Best,
>
>
> Martin
>
>
>
> A lot.  I am speaking about the new A-EON machines.
>
>
>
> The end users have to fix their  RDBs if they want to use the new patched kernels.

So what you're saying is that you have let your end users use RDB 
partitions on the old kernels that had a bug against them in the RDB 
code for eleven years, and proposed bugfixes for as long, patches to 
resolve the problem submitted to linux-block for the last five years, 
and you never once stopped to investigate what the ramifications of this 
bug were, and what the consequences of the proposed bugfix would be?

The discussion of this bug among Martin, Joanne and Geert didn't leave a 
lot to imagination as regards data corruption potential.

> But a normal user can’t edit the RDB manually. What can we do for the end users?

End users can use whatever tool they happened to use to partition the 
disk in the first instance, and correct the partition table that way.

Leaving 8 GB unused at the end of the disk can't be some feature of 
Amiga partition editors, leastways not one that can't be overridden?

But if you want to support your large userbase by a convenient solution, 
may I suggest you write a small tool that gets the disks's end cylinder 
from the RDB (field rdb_HiCylinder, offset 0x8c), then walk the 
partition list starting from rdb_PartitionList for the first partition 
block, then pb_Next for the next one, find the last valid partition 
(where pb_Next is 0xffffffff), and check whether the partition size 
calculation for that partition (in 32 bit arithmetic) would cause the 
partition end to land beyond EOD. If that is the case, the old kernel 
code would have truncated that partition to exactly EOD, and you have to 
change the partition end cylinder value (offset 0xa8 from the start of 
that partition block) to the value of rdb_HiCylinder. Adjust the 
partition checksum (offset 0x8) by the difference of the old and new 
values (i.e. add (old-new) to the checksum stored there) and you should 
have a valid partition to the end of the disk.

Might be a bit tough in a shell script but not too hard in Perl or Python.

Putting that kind of fix in the kernel would be asking Jens and Linus 
(and quite a few others) to yell at me and call me names, and for very 
good reason (but of course you can always do that in kernels you 
distribute to your end users).

Cheers,

	Michael

Martin Steigerwald July 3, 2023, 7:05 a.m. UTC | #14

Michael Schmitz - 02.07.23, 22:22:27 CEST:
> > I have read through the last mails without commenting. I admit: I do
> > not yet get what is wrong here? A checksum was miscalculated? Is
> > this a regular thing to happen when using RDB disks with Linux
> > partitions?
> I sent instructions to Christian on how to fix his partition table so
> the size mismatch between partition and filesystem (caused by the old
> RDB code) can be avoided, and misreading the checksum calculation code
> I forgot to update the checksum. That's all.

Ah okay. Sure, the checksum needs to be updated then.

From what I thought I gathered from Christian, I thought that his issue 
would be something that would automatically be triggered by just using 
disks Amiga + Linux RDB partitioning setup. And I did not get, how any 
tool on AmigaOS would create partition tables with errors like too large 
partitions in them. I am not completely sure about the amiga-fdisk tool 
for Linux, but even there I would be surprised if it would allow to 
create such a partition table. Especially given that as I remember back 
then when I faced the overflow issue amiga-fdisk showed the correct 
values. I always suggest to use a tool on AmigaOS however.

So that was it: I did not get how Christian comes to claim that so many 
users were affected with incorrect partition tables, cause frankly Amiga 
RDB partitioning tools are not actually famous for creating incorrect 
partition tables like this. There has been some compatibility issue 
between some Phase 5 tool with a name I do not remember and the other 
tools back then I believe, but it was not about partition sizes. 
Especially if you use a HDToolBox from any AmigaOS version up to 3.x or 
Media Toolbox from AmigaOS 4.x with automatic geometry calculation, I 
never heard of such a partition to large error in the partition table. 
Those tools simply do not allow creating that.

So, Christian, unless you can actually enlighten us on a reproducible 
way how users with those setups end up with incorrect partition tables 
like this, I consider this case closed. So far you didn't.

Ciao,

Christian Zigotzky July 3, 2023, 2:19 p.m. UTC | #15

On 03.07.23 09:05, Martin Steigerwald wrote:
> So, Christian, unless you can actually enlighten us on a reproducible
> way how users with those setups end up with incorrect partition tables
> like this, I consider this case closed. So far you didn't.
>
> Ciao,
This is a very simple explanation. The first partitions on the RDB disk 
were created with Media Toolbox on OS4.1. After that some additional 
partitions were created with Linux and formatted with ext4.
With the new patched kernel, these can no longer be mounted.

I will try out, if I can correct them with GParted. If it works, then I 
will write some instructions for correcting the partitions via GParted 
for the end users.

GParted is a good tool and suitable for our customers.

I know, this is a very old bug and no one has noticed this one. I have 
not received any error messages because of Linux partitions on RDB disks 
in the last 10 years. I am very happy that this bug is fixed now but we 
have to explain it to our customers why they can't mount their Linux 
partitions on the RDB disk anymore. Booting is of course also affected. 
(Mounting the root partition)

But maybe simple GParted instructions is a good solution.

Christian Zigotzky July 3, 2023, 2:59 p.m. UTC | #16

On 03.07.23 16:19, Christian Zigotzky wrote:
> On 03.07.23 09:05, Martin Steigerwald wrote:
>> So, Christian, unless you can actually enlighten us on a reproducible
>> way how users with those setups end up with incorrect partition tables
>> like this, I consider this case closed. So far you didn't.
>>
>> Ciao,
> This is a very simple explanation. The first partitions on the RDB 
> disk were created with Media Toolbox on OS4.1. After that some 
> additional partitions were created with Linux and formatted with ext4.
> With the new patched kernel, these can no longer be mounted.
>
> I will try out, if I can correct them with GParted. If it works, then 
> I will write some instructions for correcting the partitions via 
> GParted for the end users.
>
> GParted is a good tool and suitable for our customers.
>
> I know, this is a very old bug and no one has noticed this one. I have 
> not received any error messages because of Linux partitions on RDB 
> disks in the last 10 years. I am very happy that this bug is fixed now 
> but we have to explain it to our customers why they can't mount their 
> Linux partitions on the RDB disk anymore. Booting is of course also 
> affected. (Mounting the root partition)
>
> But maybe simple GParted instructions are a good solution.
You can apply the patch. I will revert this patch until I find a simple 
solution for our community.

Thank you for fixing this issue!

Michael Schmitz July 3, 2023, 9:24 p.m. UTC | #17

Hi Christian,

On 4/07/23 02:59, Christian Zigotzky wrote:
>> I am very happy that this bug is fixed now but we have to explain it 
>> to our customers why they can't mount their Linux partitions on the 
>> RDB disk anymore. Booting is of course also affected. (Mounting the 
>> root partition)
>>
>> But maybe simple GParted instructions are a good solution.
> You can apply the patch. I will revert this patch until I find a 
> simple solution for our community.
>
> Thank you for fixing this issue!

Thanks for testing - I'll add your Tested-by: tag now. I have to correct 
the Fixes: tag anyway.

Jens - is the bugfix patch enough, or do you need a new version of the 
entire series?

Cheers,

     Michael

Jens Axboe July 3, 2023, 9:27 p.m. UTC | #18

On 7/3/23 3:24?PM, Michael Schmitz wrote:
> Hi Christian,
> 
> On 4/07/23 02:59, Christian Zigotzky wrote:
>>> I am very happy that this bug is fixed now but we have to explain it to our customers why they can't mount their Linux partitions on the RDB disk anymore. Booting is of course also affected. (Mounting the root partition)
>>>
>>> But maybe simple GParted instructions are a good solution.
>> You can apply the patch. I will revert this patch until I find a simple solution for our community.
>>
>> Thank you for fixing this issue!
> 
> Thanks for testing - I'll add your Tested-by: tag now. I have to
> correct the Fixes: tag anyway.
> 
> Jens - is the bugfix patch enough, or do you need a new version of the
> entire series?

Well, the whole series is already upstream, so that part is set in
stone. What I'm unclear on is if the final fix is the parent of this
thread, or if there's later version burried somewhere within this big
thread?

Michael Schmitz July 3, 2023, 10:43 p.m. UTC | #19

Hi Jens,

On 4/07/23 09:27, Jens Axboe wrote:
> On 7/3/23 3:24?PM, Michael Schmitz wrote:
>> Hi Christian,
>>
>> On 4/07/23 02:59, Christian Zigotzky wrote:
>>>> I am very happy that this bug is fixed now but we have to explain it to our customers why they can't mount their Linux partitions on the RDB disk anymore. Booting is of course also affected. (Mounting the root partition)
>>>>
>>>> But maybe simple GParted instructions are a good solution.
>>> You can apply the patch. I will revert this patch until I find a simple solution for our community.
>>>
>>> Thank you for fixing this issue!
>> Thanks for testing - I'll add your Tested-by: tag now. I have to
>> correct the Fixes: tag anyway.
>>
>> Jens - is the bugfix patch enough, or do you need a new version of the
>> entire series?
> Well, the whole series is already upstream, so that part is set in
That's what I thought, but thought to better ask...
> stone. What I'm unclear on is if the final fix is the parent of this
> thread, or if there's later version burried somewhere within this big
> thread?

It's the parent of this thread (Message ID 
<20230701023524.7434-1-schmitzmic@gmail.com>) but I'd botched up the 
Fixes: tag; might also need to reword the commit message to clearly 
explain the ramifications of commit b6f3f28f60 as regards partitions at 
the end of the disk that were subject to size overflow and truncation in 
previous kernels.

I'll send v2 of the bugfix shortly, OK?

Cheers,

     Michael

John Paul Adrian Glaubitz July 4, 2023, 5:06 a.m. UTC | #20

Hi Michael!

On Tue, 2023-07-04 at 09:24 +1200, Michael Schmitz wrote:
> Hi Christian,
> 
> On 4/07/23 02:59, Christian Zigotzky wrote:
> > > I am very happy that this bug is fixed now but we have to explain it 
> > > to our customers why they can't mount their Linux partitions on the 
> > > RDB disk anymore. Booting is of course also affected. (Mounting the 
> > > root partition)
> > > 
> > > But maybe simple GParted instructions are a good solution.
> > You can apply the patch. I will revert this patch until I find a 
> > simple solution for our community.
> > 
> > Thank you for fixing this issue!
> 
> Thanks for testing - I'll add your Tested-by: tag now. I have to correct 
> the Fixes: tag anyway.

Have we actually agreed now that this is a bug and not just an effect of the
corrupted RDB that Christian provided?

> Jens - is the bugfix patch enough, or do you need a new version of the 
> entire series?

But the series has already been applied and released in 6.4, hasn't it?

Adrian

Michael Schmitz July 4, 2023, 5:44 a.m. UTC | #21

Hi Adrian,

Am 04.07.2023 um 17:06 schrieb John Paul Adrian Glaubitz:
> Hi Michael!
>
> On Tue, 2023-07-04 at 09:24 +1200, Michael Schmitz wrote:
>> Hi Christian,
>>
>> On 4/07/23 02:59, Christian Zigotzky wrote:
>>>> I am very happy that this bug is fixed now but we have to explain it
>>>> to our customers why they can't mount their Linux partitions on the
>>>> RDB disk anymore. Booting is of course also affected. (Mounting the
>>>> root partition)
>>>>
>>>> But maybe simple GParted instructions are a good solution.
>>> You can apply the patch. I will revert this patch until I find a
>>> simple solution for our community.
>>>
>>> Thank you for fixing this issue!
>>
>> Thanks for testing - I'll add your Tested-by: tag now. I have to correct
>> the Fixes: tag anyway.
>
> Have we actually agreed now that this is a bug and not just an effect of the
> corrupted RDB that Christian provided?

The RDB was perfectly fine. Due to 32 bit integer arithmetic overflow, 
old RDB code passed an incorrect partition size to put_partition(),
and instead of rejecting a partition that extends past the end of the 
disk, put_partition() truncated the size.
>
>> Jens - is the bugfix patch enough, or do you need a new version of the
>> entire series?
>
> But the series has already been applied and released in 6.4, hasn't it?

That's right - I wasn't sure whether it had already gone upstream (but 
even then, squeezing a bugfix in with an accepted patch isn't usually done).

Cheers,

	Michael

John Paul Adrian Glaubitz July 4, 2023, 5:48 a.m. UTC | #22

On Tue, 2023-07-04 at 17:44 +1200, Michael Schmitz wrote:
> Hi Adrian,
> 
> Am 04.07.2023 um 17:06 schrieb John Paul Adrian Glaubitz:
> > Hi Michael!
> > 
> > On Tue, 2023-07-04 at 09:24 +1200, Michael Schmitz wrote:
> > > Hi Christian,
> > > 
> > > On 4/07/23 02:59, Christian Zigotzky wrote:
> > > > > I am very happy that this bug is fixed now but we have to explain it
> > > > > to our customers why they can't mount their Linux partitions on the
> > > > > RDB disk anymore. Booting is of course also affected. (Mounting the
> > > > > root partition)
> > > > > 
> > > > > But maybe simple GParted instructions are a good solution.
> > > > You can apply the patch. I will revert this patch until I find a
> > > > simple solution for our community.
> > > > 
> > > > Thank you for fixing this issue!
> > > 
> > > Thanks for testing - I'll add your Tested-by: tag now. I have to correct
> > > the Fixes: tag anyway.
> > 
> > Have we actually agreed now that this is a bug and not just an effect of the
> > corrupted RDB that Christian provided?
> 
> The RDB was perfectly fine. Due to 32 bit integer arithmetic overflow, 
> old RDB code passed an incorrect partition size to put_partition(),
> and instead of rejecting a partition that extends past the end of the 
> disk, put_partition() truncated the size.

OK, so using "-1" as an end-of-disk partition marker is fine, but it was just
the partition size recorded in Christian's RDB that was incorrect, correct?

> > 
> > > Jens - is the bugfix patch enough, or do you need a new version of the
> > > entire series?
> > 
> > But the series has already been applied and released in 6.4, hasn't it?
> 
> That's right - I wasn't sure whether it had already gone upstream (but 
> even then, squeezing a bugfix in with an accepted patch isn't usually done).

It's even released already ;-). That's why Christian ran into the problem in the
first place.

Adrian

Michael Schmitz July 4, 2023, 5:58 a.m. UTC | #23

Hi Adrian,

Am 04.07.2023 um 17:48 schrieb John Paul Adrian Glaubitz:
>>>
>>> Have we actually agreed now that this is a bug and not just an effect of the
>>> corrupted RDB that Christian provided?
>>
>> The RDB was perfectly fine. Due to 32 bit integer arithmetic overflow,
>> old RDB code passed an incorrect partition size to put_partition(),
>> and instead of rejecting a partition that extends past the end of the
>> disk, put_partition() truncated the size.
>
> OK, so using "-1" as an end-of-disk partition marker is fine, but it was just
> the partition size recorded in Christian's RDB that was incorrect, correct?

No, the partition size in the RDB was correct (valid, end cylinder 
before end of disk). The partition size seen by user space tools when 
running the old kernels was incorrect. That lead to the filesystem size 
exceeding the partition size, which only came to light once the overflow 
fixes had gone in.

I know it does sound like semantic sophism, but we have to be clear that 
what the user put in the partition block is definite. I haven't had much 
luck with heuristics in kernel code lately...

>>>
>>>> Jens - is the bugfix patch enough, or do you need a new version of the
>>>> entire series?
>>>
>>> But the series has already been applied and released in 6.4, hasn't it?
>>
>> That's right - I wasn't sure whether it had already gone upstream (but
>> even then, squeezing a bugfix in with an accepted patch isn't usually done).
>
> It's even released already ;-). That's why Christian ran into the problem in the
> first place.

I had hoped he'd spotted it in linux-block ...

Cheers,

	Michael


>
> Adrian
>

Martin Steigerwald July 4, 2023, 7:28 a.m. UTC | #24

Michael Schmitz - 04.07.23, 07:58:13 CEST:
> > OK, so using "-1" as an end-of-disk partition marker is fine, but it
> > was just the partition size recorded in Christian's RDB that was
> > incorrect, correct?
> No, the partition size in the RDB was correct (valid, end cylinder
> before end of disk). The partition size seen by user space tools when
> running the old kernels was incorrect. That lead to the filesystem
> size exceeding the partition size, which only came to light once the
> overflow fixes had gone in.
> 
> I know it does sound like semantic sophism, but we have to be clear
> that what the user put in the partition block is definite. I haven't
> had much luck with heuristics in kernel code lately...

Now I finally get this issue, I think. Thanks for this explanation.

I think something like this would do good in the patch description.

Best,

block: bugfix for Amiga partition overflow check patch

Commit Message

Comments

Patch