diff mbox series

[03/14] io_uring: specify freeptr usage for SLAB_TYPESAFE_BY_RCU io_kiocb cache

Message ID 20241029152249.667290-4-axboe@kernel.dk (mailing list archive)
State New
Headers show
Series Rewrite rsrc node handling | expand

Commit Message

Jens Axboe Oct. 29, 2024, 3:16 p.m. UTC
Doesn't matter right now as there's still some bytes left for it, but
let's prepare for the io_kiocb potentially growing and add a specific
freeptr offset for it.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
---
 io_uring/io_uring.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Guenter Roeck Nov. 19, 2024, 3:36 p.m. UTC | #1
Hi,

On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
> Doesn't matter right now as there's still some bytes left for it, but
> let's prepare for the io_kiocb potentially growing and add a specific
> freeptr offset for it.
> 
> Signed-off-by: Jens Axboe <axboe@kernel.dk>

This patch triggers:

Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
Stack from 00c63e5c:
        00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
        004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
        00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
        004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
        00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
        00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
Call Trace: [<004b9044>] dump_stack+0xc/0x10
 [<004ae21e>] panic+0xc4/0x252
 [<000c6974>] __kmem_cache_create_args+0x216/0x26c
 [<004a72c2>] strcpy+0x0/0x1c
 [<0002cb62>] parse_args+0x0/0x1f2
 [<000c675e>] __kmem_cache_create_args+0x0/0x26c
 [<004adb58>] memset+0x0/0x8c
 [<0076f28a>] io_uring_init+0x4c/0xca
 [<0076f23e>] io_uring_init+0x0/0xca
 [<000020e0>] do_one_initcall+0x32/0x192
 [<0076f23e>] io_uring_init+0x0/0xca
 [<0000211c>] do_one_initcall+0x6e/0x192
 [<004a72c2>] strcpy+0x0/0x1c
 [<0002cb62>] parse_args+0x0/0x1f2
 [<000020ae>] do_one_initcall+0x0/0x192
 [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
 [<0076f23e>] io_uring_init+0x0/0xca
 [<004b911a>] kernel_init+0x0/0xec
 [<004b912e>] kernel_init+0x14/0xec
 [<004b911a>] kernel_init+0x0/0xec
 [<0000252c>] ret_from_kernel_thread+0xc/0x14

when trying to boot the m68k:q800 machine in qemu.

An added debug message in create_cache() shows the reason:

#### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4

freeptr_offset would need to be 4-byte aligned but that is not the case on m68k.

Bisect log attached.

Guenter

---
# bad: [158f238aa69d91ad74e535c73f552bd4b025109c] Merge tag 'for-linus-6.13-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
# good: [adc218676eef25575469234709c2d87185ca223a] Linux 6.12
git bisect start '158f238aa69d' 'v6.12'
# good: [77a0cfafa9af9c0d5b43534eb90d530c189edca1] Merge tag 'for-6.13/block-20241118' of git://git.kernel.dk/linux
git bisect good 77a0cfafa9af9c0d5b43534eb90d530c189edca1
# bad: [0338cd9c22d1bce7dc4a6641d4215a50f476f429] Merge tag 's390-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
git bisect bad 0338cd9c22d1bce7dc4a6641d4215a50f476f429
# good: [fbe057e874c7037982dea38235e8b9a9be05a8d5] s390/cpu_mf: Convert to use flag output macros
git bisect good fbe057e874c7037982dea38235e8b9a9be05a8d5
# bad: [2f3cc8e441c9f657ff036c56baaab7dddbd0b350] io_uring/napi: protect concurrent io_napi_entry timeout accesses
git bisect bad 2f3cc8e441c9f657ff036c56baaab7dddbd0b350
# good: [d090bffab609762af06dec295a305ce270941b42] io_uring/memmap: explicitly return -EFAULT for mmap on NULL rings
git bisect good d090bffab609762af06dec295a305ce270941b42
# bad: [3597f2786b687a7f26361ce00a805ea0af41b65f] io_uring/rsrc: unify file and buffer resource tables
git bisect bad 3597f2786b687a7f26361ce00a805ea0af41b65f
# good: [ff1256b8f3c45f222bce19fbfc1e1bc498b31d03] io_uring/rsrc: move struct io_fixed_file to rsrc.h header
git bisect good ff1256b8f3c45f222bce19fbfc1e1bc498b31d03
# bad: [7029acd8a950393ee3a3d8e1a7ee1a9b77808a3b] io_uring/rsrc: get rid of per-ring io_rsrc_node list
git bisect bad 7029acd8a950393ee3a3d8e1a7ee1a9b77808a3b
# bad: [743fb58a35cde8fe27b07ee5a985ae76563845e3] io_uring/splice: open code 2nd direct file assignment
git bisect bad 743fb58a35cde8fe27b07ee5a985ae76563845e3
# bad: [aaa736b186239b7dc7778ae94c75f26c96972796] io_uring: specify freeptr usage for SLAB_TYPESAFE_BY_RCU io_kiocb cache
git bisect bad aaa736b186239b7dc7778ae94c75f26c96972796
# first bad commit: [aaa736b186239b7dc7778ae94c75f26c96972796] io_uring: specify freeptr usage for SLAB_TYPESAFE_BY_RCU io_kiocb cache
Jens Axboe Nov. 19, 2024, 4:02 p.m. UTC | #2
On 11/19/24 8:36 AM, Guenter Roeck wrote:
> Hi,
> 
> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
>> Doesn't matter right now as there's still some bytes left for it, but
>> let's prepare for the io_kiocb potentially growing and add a specific
>> freeptr offset for it.
>>
>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> This patch triggers:
> 
> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
> Stack from 00c63e5c:
>         00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
>         004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
>         00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
>         004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
>         00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
>         00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
> Call Trace: [<004b9044>] dump_stack+0xc/0x10
>  [<004ae21e>] panic+0xc4/0x252
>  [<000c6974>] __kmem_cache_create_args+0x216/0x26c
>  [<004a72c2>] strcpy+0x0/0x1c
>  [<0002cb62>] parse_args+0x0/0x1f2
>  [<000c675e>] __kmem_cache_create_args+0x0/0x26c
>  [<004adb58>] memset+0x0/0x8c
>  [<0076f28a>] io_uring_init+0x4c/0xca
>  [<0076f23e>] io_uring_init+0x0/0xca
>  [<000020e0>] do_one_initcall+0x32/0x192
>  [<0076f23e>] io_uring_init+0x0/0xca
>  [<0000211c>] do_one_initcall+0x6e/0x192
>  [<004a72c2>] strcpy+0x0/0x1c
>  [<0002cb62>] parse_args+0x0/0x1f2
>  [<000020ae>] do_one_initcall+0x0/0x192
>  [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
>  [<0076f23e>] io_uring_init+0x0/0xca
>  [<004b911a>] kernel_init+0x0/0xec
>  [<004b912e>] kernel_init+0x14/0xec
>  [<004b911a>] kernel_init+0x0/0xec
>  [<0000252c>] ret_from_kernel_thread+0xc/0x14
> 
> when trying to boot the m68k:q800 machine in qemu.
> 
> An added debug message in create_cache() shows the reason:
> 
> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
> 
> freeptr_offset would need to be 4-byte aligned but that is not the
> case on m68k.

Why is ->work 2-byte aligned to begin with on m68k?!
Guenter Roeck Nov. 19, 2024, 4:21 p.m. UTC | #3
On 11/19/24 08:02, Jens Axboe wrote:
> On 11/19/24 8:36 AM, Guenter Roeck wrote:
>> Hi,
>>
>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
>>> Doesn't matter right now as there's still some bytes left for it, but
>>> let's prepare for the io_kiocb potentially growing and add a specific
>>> freeptr offset for it.
>>>
>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>
>> This patch triggers:
>>
>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
>> Stack from 00c63e5c:
>>          00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
>>          004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
>>          00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
>>          004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
>>          00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
>>          00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
>>   [<004ae21e>] panic+0xc4/0x252
>>   [<000c6974>] __kmem_cache_create_args+0x216/0x26c
>>   [<004a72c2>] strcpy+0x0/0x1c
>>   [<0002cb62>] parse_args+0x0/0x1f2
>>   [<000c675e>] __kmem_cache_create_args+0x0/0x26c
>>   [<004adb58>] memset+0x0/0x8c
>>   [<0076f28a>] io_uring_init+0x4c/0xca
>>   [<0076f23e>] io_uring_init+0x0/0xca
>>   [<000020e0>] do_one_initcall+0x32/0x192
>>   [<0076f23e>] io_uring_init+0x0/0xca
>>   [<0000211c>] do_one_initcall+0x6e/0x192
>>   [<004a72c2>] strcpy+0x0/0x1c
>>   [<0002cb62>] parse_args+0x0/0x1f2
>>   [<000020ae>] do_one_initcall+0x0/0x192
>>   [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
>>   [<0076f23e>] io_uring_init+0x0/0xca
>>   [<004b911a>] kernel_init+0x0/0xec
>>   [<004b912e>] kernel_init+0x14/0xec
>>   [<004b911a>] kernel_init+0x0/0xec
>>   [<0000252c>] ret_from_kernel_thread+0xc/0x14
>>
>> when trying to boot the m68k:q800 machine in qemu.
>>
>> An added debug message in create_cache() shows the reason:
>>
>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
>>
>> freeptr_offset would need to be 4-byte aligned but that is not the
>> case on m68k.
> 
> Why is ->work 2-byte aligned to begin with on m68k?!
> 

My understanding is that m68k does not align pointers.

Copying Geert and the m68k mailing list for feedback. Sorry, I should have done
that earlier.

Guenter
Geert Uytterhoeven Nov. 19, 2024, 5:49 p.m. UTC | #4
On Tue, Nov 19, 2024 at 5:21 PM Guenter Roeck <linux@roeck-us.net> wrote:
> On 11/19/24 08:02, Jens Axboe wrote:
> > On 11/19/24 8:36 AM, Guenter Roeck wrote:
> >> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
> >>> Doesn't matter right now as there's still some bytes left for it, but
> >>> let's prepare for the io_kiocb potentially growing and add a specific
> >>> freeptr offset for it.
> >>>
> >>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>
> >> This patch triggers:
> >>
> >> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
> >> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
> >> Stack from 00c63e5c:
> >>          00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
> >>          004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
> >>          00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
> >>          004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
> >>          00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
> >>          00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
> >> Call Trace: [<004b9044>] dump_stack+0xc/0x10
> >>   [<004ae21e>] panic+0xc4/0x252
> >>   [<000c6974>] __kmem_cache_create_args+0x216/0x26c
> >>   [<004a72c2>] strcpy+0x0/0x1c
> >>   [<0002cb62>] parse_args+0x0/0x1f2
> >>   [<000c675e>] __kmem_cache_create_args+0x0/0x26c
> >>   [<004adb58>] memset+0x0/0x8c
> >>   [<0076f28a>] io_uring_init+0x4c/0xca
> >>   [<0076f23e>] io_uring_init+0x0/0xca
> >>   [<000020e0>] do_one_initcall+0x32/0x192
> >>   [<0076f23e>] io_uring_init+0x0/0xca
> >>   [<0000211c>] do_one_initcall+0x6e/0x192
> >>   [<004a72c2>] strcpy+0x0/0x1c
> >>   [<0002cb62>] parse_args+0x0/0x1f2
> >>   [<000020ae>] do_one_initcall+0x0/0x192
> >>   [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
> >>   [<0076f23e>] io_uring_init+0x0/0xca
> >>   [<004b911a>] kernel_init+0x0/0xec
> >>   [<004b912e>] kernel_init+0x14/0xec
> >>   [<004b911a>] kernel_init+0x0/0xec
> >>   [<0000252c>] ret_from_kernel_thread+0xc/0x14
> >>
> >> when trying to boot the m68k:q800 machine in qemu.
> >>
> >> An added debug message in create_cache() shows the reason:
> >>
> >> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
> >>
> >> freeptr_offset would need to be 4-byte aligned but that is not the
> >> case on m68k.
> >
> > Why is ->work 2-byte aligned to begin with on m68k?!
>
> My understanding is that m68k does not align pointers.

The minimum alignment for multi-byte integral values on m68k is
2 bytes.

See also the comment at
https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46

Gr{oetje,eeting}s,

                        Geert
Jens Axboe Nov. 19, 2024, 7 p.m. UTC | #5
On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
> On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
>> On 11/19/24 08:02, Jens Axboe wrote:
>>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
>>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
>>>>> Doesn't matter right now as there's still some bytes left for it, but
>>>>> let's prepare for the io_kiocb potentially growing and add a specific
>>>>> freeptr offset for it.
>>>>>
>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>
>>>> This patch triggers:
>>>>
>>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
>>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
>>>> Stack from 00c63e5c:
>>>>          00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
>>>>          004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
>>>>          00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
>>>>          004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
>>>>          00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
>>>>          00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
>>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
>>>>   [<004ae21e>] panic+0xc4/0x252
>>>>   [<000c6974>] __kmem_cache_create_args+0x216/0x26c
>>>>   [<004a72c2>] strcpy+0x0/0x1c
>>>>   [<0002cb62>] parse_args+0x0/0x1f2
>>>>   [<000c675e>] __kmem_cache_create_args+0x0/0x26c
>>>>   [<004adb58>] memset+0x0/0x8c
>>>>   [<0076f28a>] io_uring_init+0x4c/0xca
>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>   [<000020e0>] do_one_initcall+0x32/0x192
>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>   [<0000211c>] do_one_initcall+0x6e/0x192
>>>>   [<004a72c2>] strcpy+0x0/0x1c
>>>>   [<0002cb62>] parse_args+0x0/0x1f2
>>>>   [<000020ae>] do_one_initcall+0x0/0x192
>>>>   [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>   [<004b911a>] kernel_init+0x0/0xec
>>>>   [<004b912e>] kernel_init+0x14/0xec
>>>>   [<004b911a>] kernel_init+0x0/0xec
>>>>   [<0000252c>] ret_from_kernel_thread+0xc/0x14
>>>>
>>>> when trying to boot the m68k:q800 machine in qemu.
>>>>
>>>> An added debug message in create_cache() shows the reason:
>>>>
>>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
>>>>
>>>> freeptr_offset would need to be 4-byte aligned but that is not the
>>>> case on m68k.
>>>
>>> Why is ->work 2-byte aligned to begin with on m68k?!
>>
>> My understanding is that m68k does not align pointers.
> 
> The minimum alignment for multi-byte integral values on m68k is
> 2 bytes.
> 
> See also the comment at
> https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46

Maybe it's time we put m68k to bed? :-)

We can add a forced alignment ->work to be 4 bytes, won't change
anything on anything remotely current. But does feel pretty hacky to
need to align based on some ancient thing.
Geert Uytterhoeven Nov. 19, 2024, 7:02 p.m. UTC | #6
Hi Jens.

On Tue, Nov 19, 2024 at 8:00 PM Jens Axboe <axboe@kernel.dk> wrote:
> On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
> > On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
> >> On 11/19/24 08:02, Jens Axboe wrote:
> >>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
> >>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
> >>>>> Doesn't matter right now as there's still some bytes left for it, but
> >>>>> let's prepare for the io_kiocb potentially growing and add a specific
> >>>>> freeptr offset for it.
> >>>>>
> >>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>>>
> >>>> This patch triggers:
> >>>>
> >>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
> >>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
> >>>> Stack from 00c63e5c:
> >>>>          00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
> >>>>          004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
> >>>>          00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
> >>>>          004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
> >>>>          00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
> >>>>          00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
> >>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
> >>>>   [<004ae21e>] panic+0xc4/0x252
> >>>>   [<000c6974>] __kmem_cache_create_args+0x216/0x26c
> >>>>   [<004a72c2>] strcpy+0x0/0x1c
> >>>>   [<0002cb62>] parse_args+0x0/0x1f2
> >>>>   [<000c675e>] __kmem_cache_create_args+0x0/0x26c
> >>>>   [<004adb58>] memset+0x0/0x8c
> >>>>   [<0076f28a>] io_uring_init+0x4c/0xca
> >>>>   [<0076f23e>] io_uring_init+0x0/0xca
> >>>>   [<000020e0>] do_one_initcall+0x32/0x192
> >>>>   [<0076f23e>] io_uring_init+0x0/0xca
> >>>>   [<0000211c>] do_one_initcall+0x6e/0x192
> >>>>   [<004a72c2>] strcpy+0x0/0x1c
> >>>>   [<0002cb62>] parse_args+0x0/0x1f2
> >>>>   [<000020ae>] do_one_initcall+0x0/0x192
> >>>>   [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
> >>>>   [<0076f23e>] io_uring_init+0x0/0xca
> >>>>   [<004b911a>] kernel_init+0x0/0xec
> >>>>   [<004b912e>] kernel_init+0x14/0xec
> >>>>   [<004b911a>] kernel_init+0x0/0xec
> >>>>   [<0000252c>] ret_from_kernel_thread+0xc/0x14
> >>>>
> >>>> when trying to boot the m68k:q800 machine in qemu.
> >>>>
> >>>> An added debug message in create_cache() shows the reason:
> >>>>
> >>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
> >>>>
> >>>> freeptr_offset would need to be 4-byte aligned but that is not the
> >>>> case on m68k.
> >>>
> >>> Why is ->work 2-byte aligned to begin with on m68k?!
> >>
> >> My understanding is that m68k does not align pointers.
> >
> > The minimum alignment for multi-byte integral values on m68k is
> > 2 bytes.
> >
> > See also the comment at
> > https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
>
> Maybe it's time we put m68k to bed? :-)
>
> We can add a forced alignment ->work to be 4 bytes, won't change
> anything on anything remotely current. But does feel pretty hacky to
> need to align based on some ancient thing.

Why does freeptr_offset need to be 4-byte aligned?

Gr{oetje,eeting}s,

                        Geert
Jens Axboe Nov. 19, 2024, 7:10 p.m. UTC | #7
On 11/19/24 12:02 PM, Geert Uytterhoeven wrote:
> Hi Jens.
> 
> On Tue, Nov 19, 2024 at 8:00?PM Jens Axboe <axboe@kernel.dk> wrote:
>> On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
>>> On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>> On 11/19/24 08:02, Jens Axboe wrote:
>>>>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
>>>>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
>>>>>>> Doesn't matter right now as there's still some bytes left for it, but
>>>>>>> let's prepare for the io_kiocb potentially growing and add a specific
>>>>>>> freeptr offset for it.
>>>>>>>
>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>>>
>>>>>> This patch triggers:
>>>>>>
>>>>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
>>>>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
>>>>>> Stack from 00c63e5c:
>>>>>>          00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
>>>>>>          004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
>>>>>>          00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
>>>>>>          004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
>>>>>>          00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
>>>>>>          00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
>>>>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
>>>>>>   [<004ae21e>] panic+0xc4/0x252
>>>>>>   [<000c6974>] __kmem_cache_create_args+0x216/0x26c
>>>>>>   [<004a72c2>] strcpy+0x0/0x1c
>>>>>>   [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>   [<000c675e>] __kmem_cache_create_args+0x0/0x26c
>>>>>>   [<004adb58>] memset+0x0/0x8c
>>>>>>   [<0076f28a>] io_uring_init+0x4c/0xca
>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>   [<000020e0>] do_one_initcall+0x32/0x192
>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>   [<0000211c>] do_one_initcall+0x6e/0x192
>>>>>>   [<004a72c2>] strcpy+0x0/0x1c
>>>>>>   [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>   [<000020ae>] do_one_initcall+0x0/0x192
>>>>>>   [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>   [<004b911a>] kernel_init+0x0/0xec
>>>>>>   [<004b912e>] kernel_init+0x14/0xec
>>>>>>   [<004b911a>] kernel_init+0x0/0xec
>>>>>>   [<0000252c>] ret_from_kernel_thread+0xc/0x14
>>>>>>
>>>>>> when trying to boot the m68k:q800 machine in qemu.
>>>>>>
>>>>>> An added debug message in create_cache() shows the reason:
>>>>>>
>>>>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
>>>>>>
>>>>>> freeptr_offset would need to be 4-byte aligned but that is not the
>>>>>> case on m68k.
>>>>>
>>>>> Why is ->work 2-byte aligned to begin with on m68k?!
>>>>
>>>> My understanding is that m68k does not align pointers.
>>>
>>> The minimum alignment for multi-byte integral values on m68k is
>>> 2 bytes.
>>>
>>> See also the comment at
>>> https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
>>
>> Maybe it's time we put m68k to bed? :-)
>>
>> We can add a forced alignment ->work to be 4 bytes, won't change
>> anything on anything remotely current. But does feel pretty hacky to
>> need to align based on some ancient thing.
> 
> Why does freeptr_offset need to be 4-byte aligned?

Didn't check, but it's slab/slub complaining using a 2-byte aligned
address for the free pointer offset. It's explicitly checking:

	/* If a custom freelist pointer is requested make sure it's sane. */
	err = -EINVAL;
	if (args->use_freeptr_offset &&
	    (args->freeptr_offset >= object_size ||
	     !(flags & SLAB_TYPESAFE_BY_RCU) ||
	     !IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
		goto out;
Geert Uytterhoeven Nov. 19, 2024, 7:25 p.m. UTC | #8
Hi Jens,

On Tue, Nov 19, 2024 at 8:10 PM Jens Axboe <axboe@kernel.dk> wrote:
> On 11/19/24 12:02 PM, Geert Uytterhoeven wrote:
> > On Tue, Nov 19, 2024 at 8:00?PM Jens Axboe <axboe@kernel.dk> wrote:
> >> On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
> >>> On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
> >>>> On 11/19/24 08:02, Jens Axboe wrote:
> >>>>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
> >>>>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
> >>>>>>> Doesn't matter right now as there's still some bytes left for it, but
> >>>>>>> let's prepare for the io_kiocb potentially growing and add a specific
> >>>>>>> freeptr offset for it.
> >>>>>>>
> >>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>>>>>
> >>>>>> This patch triggers:
> >>>>>>
> >>>>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
> >>>>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
> >>>>>> Stack from 00c63e5c:
> >>>>>>          00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
> >>>>>>          004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
> >>>>>>          00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
> >>>>>>          004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
> >>>>>>          00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
> >>>>>>          00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
> >>>>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
> >>>>>>   [<004ae21e>] panic+0xc4/0x252
> >>>>>>   [<000c6974>] __kmem_cache_create_args+0x216/0x26c
> >>>>>>   [<004a72c2>] strcpy+0x0/0x1c
> >>>>>>   [<0002cb62>] parse_args+0x0/0x1f2
> >>>>>>   [<000c675e>] __kmem_cache_create_args+0x0/0x26c
> >>>>>>   [<004adb58>] memset+0x0/0x8c
> >>>>>>   [<0076f28a>] io_uring_init+0x4c/0xca
> >>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
> >>>>>>   [<000020e0>] do_one_initcall+0x32/0x192
> >>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
> >>>>>>   [<0000211c>] do_one_initcall+0x6e/0x192
> >>>>>>   [<004a72c2>] strcpy+0x0/0x1c
> >>>>>>   [<0002cb62>] parse_args+0x0/0x1f2
> >>>>>>   [<000020ae>] do_one_initcall+0x0/0x192
> >>>>>>   [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
> >>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
> >>>>>>   [<004b911a>] kernel_init+0x0/0xec
> >>>>>>   [<004b912e>] kernel_init+0x14/0xec
> >>>>>>   [<004b911a>] kernel_init+0x0/0xec
> >>>>>>   [<0000252c>] ret_from_kernel_thread+0xc/0x14
> >>>>>>
> >>>>>> when trying to boot the m68k:q800 machine in qemu.
> >>>>>>
> >>>>>> An added debug message in create_cache() shows the reason:
> >>>>>>
> >>>>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
> >>>>>>
> >>>>>> freeptr_offset would need to be 4-byte aligned but that is not the
> >>>>>> case on m68k.
> >>>>>
> >>>>> Why is ->work 2-byte aligned to begin with on m68k?!
> >>>>
> >>>> My understanding is that m68k does not align pointers.
> >>>
> >>> The minimum alignment for multi-byte integral values on m68k is
> >>> 2 bytes.
> >>>
> >>> See also the comment at
> >>> https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
> >>
> >> Maybe it's time we put m68k to bed? :-)
> >>
> >> We can add a forced alignment ->work to be 4 bytes, won't change
> >> anything on anything remotely current. But does feel pretty hacky to
> >> need to align based on some ancient thing.
> >
> > Why does freeptr_offset need to be 4-byte aligned?
>
> Didn't check, but it's slab/slub complaining using a 2-byte aligned
> address for the free pointer offset. It's explicitly checking:
>
>         /* If a custom freelist pointer is requested make sure it's sane. */
>         err = -EINVAL;
>         if (args->use_freeptr_offset &&
>             (args->freeptr_offset >= object_size ||
>              !(flags & SLAB_TYPESAFE_BY_RCU) ||
>              !IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
>                 goto out;

It is not guaranteed that alignof(freeptr_t) >= sizeof(freeptr_t)
(free_ptr is sort of a long). If freeptr_offset must be a multiple of
4 or 8 bytes,
the code that assigns it must make sure that is true.

I guess this is the code in fs/file_table.c:

    .freeptr_offset = offsetof(struct file, f_freeptr),

which references:

    include/linux/fs.h:           freeptr_t               f_freeptr;

I guess the simplest solution is to add an __aligned(sizeof(freeptr_t))
(or __aligned(sizeof(long)) to the definition of freeptr_t:

    include/linux/slab.h:typedef struct { unsigned long v; } freeptr_t;

Gr{oetje,eeting}s,

                        Geert
Jens Axboe Nov. 19, 2024, 7:30 p.m. UTC | #9
On 11/19/24 12:25 PM, Geert Uytterhoeven wrote:
> Hi Jens,
> 
> On Tue, Nov 19, 2024 at 8:10?PM Jens Axboe <axboe@kernel.dk> wrote:
>> On 11/19/24 12:02 PM, Geert Uytterhoeven wrote:
>>> On Tue, Nov 19, 2024 at 8:00?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>> On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
>>>>> On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>> On 11/19/24 08:02, Jens Axboe wrote:
>>>>>>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
>>>>>>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
>>>>>>>>> Doesn't matter right now as there's still some bytes left for it, but
>>>>>>>>> let's prepare for the io_kiocb potentially growing and add a specific
>>>>>>>>> freeptr offset for it.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>
>>>>>>>> This patch triggers:
>>>>>>>>
>>>>>>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
>>>>>>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
>>>>>>>> Stack from 00c63e5c:
>>>>>>>>          00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
>>>>>>>>          004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
>>>>>>>>          00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
>>>>>>>>          004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
>>>>>>>>          00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
>>>>>>>>          00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
>>>>>>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
>>>>>>>>   [<004ae21e>] panic+0xc4/0x252
>>>>>>>>   [<000c6974>] __kmem_cache_create_args+0x216/0x26c
>>>>>>>>   [<004a72c2>] strcpy+0x0/0x1c
>>>>>>>>   [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>>>   [<000c675e>] __kmem_cache_create_args+0x0/0x26c
>>>>>>>>   [<004adb58>] memset+0x0/0x8c
>>>>>>>>   [<0076f28a>] io_uring_init+0x4c/0xca
>>>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>   [<000020e0>] do_one_initcall+0x32/0x192
>>>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>   [<0000211c>] do_one_initcall+0x6e/0x192
>>>>>>>>   [<004a72c2>] strcpy+0x0/0x1c
>>>>>>>>   [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>>>   [<000020ae>] do_one_initcall+0x0/0x192
>>>>>>>>   [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
>>>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>   [<004b911a>] kernel_init+0x0/0xec
>>>>>>>>   [<004b912e>] kernel_init+0x14/0xec
>>>>>>>>   [<004b911a>] kernel_init+0x0/0xec
>>>>>>>>   [<0000252c>] ret_from_kernel_thread+0xc/0x14
>>>>>>>>
>>>>>>>> when trying to boot the m68k:q800 machine in qemu.
>>>>>>>>
>>>>>>>> An added debug message in create_cache() shows the reason:
>>>>>>>>
>>>>>>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
>>>>>>>>
>>>>>>>> freeptr_offset would need to be 4-byte aligned but that is not the
>>>>>>>> case on m68k.
>>>>>>>
>>>>>>> Why is ->work 2-byte aligned to begin with on m68k?!
>>>>>>
>>>>>> My understanding is that m68k does not align pointers.
>>>>>
>>>>> The minimum alignment for multi-byte integral values on m68k is
>>>>> 2 bytes.
>>>>>
>>>>> See also the comment at
>>>>> https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
>>>>
>>>> Maybe it's time we put m68k to bed? :-)
>>>>
>>>> We can add a forced alignment ->work to be 4 bytes, won't change
>>>> anything on anything remotely current. But does feel pretty hacky to
>>>> need to align based on some ancient thing.
>>>
>>> Why does freeptr_offset need to be 4-byte aligned?
>>
>> Didn't check, but it's slab/slub complaining using a 2-byte aligned
>> address for the free pointer offset. It's explicitly checking:
>>
>>         /* If a custom freelist pointer is requested make sure it's sane. */
>>         err = -EINVAL;
>>         if (args->use_freeptr_offset &&
>>             (args->freeptr_offset >= object_size ||
>>              !(flags & SLAB_TYPESAFE_BY_RCU) ||
>>              !IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
>>                 goto out;
> 
> It is not guaranteed that alignof(freeptr_t) >= sizeof(freeptr_t)
> (free_ptr is sort of a long). If freeptr_offset must be a multiple of
> 4 or 8 bytes,
> the code that assigns it must make sure that is true.

Right, this is what the email is about...

> I guess this is the code in fs/file_table.c:
> 
>     .freeptr_offset = offsetof(struct file, f_freeptr),
> 
> which references:
> 
>     include/linux/fs.h:           freeptr_t               f_freeptr;
> 
> I guess the simplest solution is to add an __aligned(sizeof(freeptr_t))
> (or __aligned(sizeof(long)) to the definition of freeptr_t:
> 
>     include/linux/slab.h:typedef struct { unsigned long v; } freeptr_t;

It's not, it's struct io_kiocb->work, as per the stack trace in this
email.
Geert Uytterhoeven Nov. 19, 2024, 7:41 p.m. UTC | #10
Hi Jens,

On Tue, Nov 19, 2024 at 8:30 PM Jens Axboe <axboe@kernel.dk> wrote:
> On 11/19/24 12:25 PM, Geert Uytterhoeven wrote:
> > On Tue, Nov 19, 2024 at 8:10?PM Jens Axboe <axboe@kernel.dk> wrote:
> >> On 11/19/24 12:02 PM, Geert Uytterhoeven wrote:
> >>> On Tue, Nov 19, 2024 at 8:00?PM Jens Axboe <axboe@kernel.dk> wrote:
> >>>> On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
> >>>>> On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
> >>>>>> On 11/19/24 08:02, Jens Axboe wrote:
> >>>>>>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
> >>>>>>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
> >>>>>>>>> Doesn't matter right now as there's still some bytes left for it, but
> >>>>>>>>> let's prepare for the io_kiocb potentially growing and add a specific
> >>>>>>>>> freeptr offset for it.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>>>>>>>
> >>>>>>>> This patch triggers:
> >>>>>>>>
> >>>>>>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
> >>>>>>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
> >>>>>>>> Stack from 00c63e5c:
> >>>>>>>>          00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
> >>>>>>>>          004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
> >>>>>>>>          00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
> >>>>>>>>          004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
> >>>>>>>>          00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
> >>>>>>>>          00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
> >>>>>>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
> >>>>>>>>   [<004ae21e>] panic+0xc4/0x252
> >>>>>>>>   [<000c6974>] __kmem_cache_create_args+0x216/0x26c
> >>>>>>>>   [<004a72c2>] strcpy+0x0/0x1c
> >>>>>>>>   [<0002cb62>] parse_args+0x0/0x1f2
> >>>>>>>>   [<000c675e>] __kmem_cache_create_args+0x0/0x26c
> >>>>>>>>   [<004adb58>] memset+0x0/0x8c
> >>>>>>>>   [<0076f28a>] io_uring_init+0x4c/0xca
> >>>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
> >>>>>>>>   [<000020e0>] do_one_initcall+0x32/0x192
> >>>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
> >>>>>>>>   [<0000211c>] do_one_initcall+0x6e/0x192
> >>>>>>>>   [<004a72c2>] strcpy+0x0/0x1c
> >>>>>>>>   [<0002cb62>] parse_args+0x0/0x1f2
> >>>>>>>>   [<000020ae>] do_one_initcall+0x0/0x192
> >>>>>>>>   [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
> >>>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
> >>>>>>>>   [<004b911a>] kernel_init+0x0/0xec
> >>>>>>>>   [<004b912e>] kernel_init+0x14/0xec
> >>>>>>>>   [<004b911a>] kernel_init+0x0/0xec
> >>>>>>>>   [<0000252c>] ret_from_kernel_thread+0xc/0x14
> >>>>>>>>
> >>>>>>>> when trying to boot the m68k:q800 machine in qemu.
> >>>>>>>>
> >>>>>>>> An added debug message in create_cache() shows the reason:
> >>>>>>>>
> >>>>>>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
> >>>>>>>>
> >>>>>>>> freeptr_offset would need to be 4-byte aligned but that is not the
> >>>>>>>> case on m68k.
> >>>>>>>
> >>>>>>> Why is ->work 2-byte aligned to begin with on m68k?!
> >>>>>>
> >>>>>> My understanding is that m68k does not align pointers.
> >>>>>
> >>>>> The minimum alignment for multi-byte integral values on m68k is
> >>>>> 2 bytes.
> >>>>>
> >>>>> See also the comment at
> >>>>> https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
> >>>>
> >>>> Maybe it's time we put m68k to bed? :-)
> >>>>
> >>>> We can add a forced alignment ->work to be 4 bytes, won't change
> >>>> anything on anything remotely current. But does feel pretty hacky to
> >>>> need to align based on some ancient thing.
> >>>
> >>> Why does freeptr_offset need to be 4-byte aligned?
> >>
> >> Didn't check, but it's slab/slub complaining using a 2-byte aligned
> >> address for the free pointer offset. It's explicitly checking:
> >>
> >>         /* If a custom freelist pointer is requested make sure it's sane. */
> >>         err = -EINVAL;
> >>         if (args->use_freeptr_offset &&
> >>             (args->freeptr_offset >= object_size ||
> >>              !(flags & SLAB_TYPESAFE_BY_RCU) ||
> >>              !IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
> >>                 goto out;
> >
> > It is not guaranteed that alignof(freeptr_t) >= sizeof(freeptr_t)
> > (free_ptr is sort of a long). If freeptr_offset must be a multiple of
> > 4 or 8 bytes,
> > the code that assigns it must make sure that is true.
>
> Right, this is what the email is about...
>
> > I guess this is the code in fs/file_table.c:
> >
> >     .freeptr_offset = offsetof(struct file, f_freeptr),
> >
> > which references:
> >
> >     include/linux/fs.h:           freeptr_t               f_freeptr;
> >
> > I guess the simplest solution is to add an __aligned(sizeof(freeptr_t))
> > (or __aligned(sizeof(long)) to the definition of freeptr_t:
> >
> >     include/linux/slab.h:typedef struct { unsigned long v; } freeptr_t;
>
> It's not, it's struct io_kiocb->work, as per the stack trace in this
> email.

Sorry, I was falling out of thin air into this thread...

linux-next/master:io_uring/io_uring.c:          .freeptr_offset =
offsetof(struct io_kiocb, work),
linux-next/master:io_uring/io_uring.c:          .use_freeptr_offset = true,

Apparently io_kiocb.work is of type struct io_wq_work, not freeptr_t?
Isn't that a bit error-prone, as the slab core code expects a freeptr_t?

Gr{oetje,eeting}s,

                        Geert
Jens Axboe Nov. 19, 2024, 7:44 p.m. UTC | #11
On 11/19/24 12:41 PM, Geert Uytterhoeven wrote:
> Hi Jens,
> 
> On Tue, Nov 19, 2024 at 8:30?PM Jens Axboe <axboe@kernel.dk> wrote:
>> On 11/19/24 12:25 PM, Geert Uytterhoeven wrote:
>>> On Tue, Nov 19, 2024 at 8:10?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>> On 11/19/24 12:02 PM, Geert Uytterhoeven wrote:
>>>>> On Tue, Nov 19, 2024 at 8:00?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>> On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
>>>>>>> On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>>>> On 11/19/24 08:02, Jens Axboe wrote:
>>>>>>>>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
>>>>>>>>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
>>>>>>>>>>> Doesn't matter right now as there's still some bytes left for it, but
>>>>>>>>>>> let's prepare for the io_kiocb potentially growing and add a specific
>>>>>>>>>>> freeptr offset for it.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>
>>>>>>>>>> This patch triggers:
>>>>>>>>>>
>>>>>>>>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
>>>>>>>>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
>>>>>>>>>> Stack from 00c63e5c:
>>>>>>>>>>          00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
>>>>>>>>>>          004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
>>>>>>>>>>          00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
>>>>>>>>>>          004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
>>>>>>>>>>          00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
>>>>>>>>>>          00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
>>>>>>>>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
>>>>>>>>>>   [<004ae21e>] panic+0xc4/0x252
>>>>>>>>>>   [<000c6974>] __kmem_cache_create_args+0x216/0x26c
>>>>>>>>>>   [<004a72c2>] strcpy+0x0/0x1c
>>>>>>>>>>   [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>>>>>   [<000c675e>] __kmem_cache_create_args+0x0/0x26c
>>>>>>>>>>   [<004adb58>] memset+0x0/0x8c
>>>>>>>>>>   [<0076f28a>] io_uring_init+0x4c/0xca
>>>>>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>   [<000020e0>] do_one_initcall+0x32/0x192
>>>>>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>   [<0000211c>] do_one_initcall+0x6e/0x192
>>>>>>>>>>   [<004a72c2>] strcpy+0x0/0x1c
>>>>>>>>>>   [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>>>>>   [<000020ae>] do_one_initcall+0x0/0x192
>>>>>>>>>>   [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
>>>>>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>   [<004b911a>] kernel_init+0x0/0xec
>>>>>>>>>>   [<004b912e>] kernel_init+0x14/0xec
>>>>>>>>>>   [<004b911a>] kernel_init+0x0/0xec
>>>>>>>>>>   [<0000252c>] ret_from_kernel_thread+0xc/0x14
>>>>>>>>>>
>>>>>>>>>> when trying to boot the m68k:q800 machine in qemu.
>>>>>>>>>>
>>>>>>>>>> An added debug message in create_cache() shows the reason:
>>>>>>>>>>
>>>>>>>>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
>>>>>>>>>>
>>>>>>>>>> freeptr_offset would need to be 4-byte aligned but that is not the
>>>>>>>>>> case on m68k.
>>>>>>>>>
>>>>>>>>> Why is ->work 2-byte aligned to begin with on m68k?!
>>>>>>>>
>>>>>>>> My understanding is that m68k does not align pointers.
>>>>>>>
>>>>>>> The minimum alignment for multi-byte integral values on m68k is
>>>>>>> 2 bytes.
>>>>>>>
>>>>>>> See also the comment at
>>>>>>> https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
>>>>>>
>>>>>> Maybe it's time we put m68k to bed? :-)
>>>>>>
>>>>>> We can add a forced alignment ->work to be 4 bytes, won't change
>>>>>> anything on anything remotely current. But does feel pretty hacky to
>>>>>> need to align based on some ancient thing.
>>>>>
>>>>> Why does freeptr_offset need to be 4-byte aligned?
>>>>
>>>> Didn't check, but it's slab/slub complaining using a 2-byte aligned
>>>> address for the free pointer offset. It's explicitly checking:
>>>>
>>>>         /* If a custom freelist pointer is requested make sure it's sane. */
>>>>         err = -EINVAL;
>>>>         if (args->use_freeptr_offset &&
>>>>             (args->freeptr_offset >= object_size ||
>>>>              !(flags & SLAB_TYPESAFE_BY_RCU) ||
>>>>              !IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
>>>>                 goto out;
>>>
>>> It is not guaranteed that alignof(freeptr_t) >= sizeof(freeptr_t)
>>> (free_ptr is sort of a long). If freeptr_offset must be a multiple of
>>> 4 or 8 bytes,
>>> the code that assigns it must make sure that is true.
>>
>> Right, this is what the email is about...
>>
>>> I guess this is the code in fs/file_table.c:
>>>
>>>     .freeptr_offset = offsetof(struct file, f_freeptr),
>>>
>>> which references:
>>>
>>>     include/linux/fs.h:           freeptr_t               f_freeptr;
>>>
>>> I guess the simplest solution is to add an __aligned(sizeof(freeptr_t))
>>> (or __aligned(sizeof(long)) to the definition of freeptr_t:
>>>
>>>     include/linux/slab.h:typedef struct { unsigned long v; } freeptr_t;
>>
>> It's not, it's struct io_kiocb->work, as per the stack trace in this
>> email.
> 
> Sorry, I was falling out of thin air into this thread...
> 
> linux-next/master:io_uring/io_uring.c:          .freeptr_offset =
> offsetof(struct io_kiocb, work),
> linux-next/master:io_uring/io_uring.c:          .use_freeptr_offset = true,
> 
> Apparently io_kiocb.work is of type struct io_wq_work, not freeptr_t?
> Isn't that a bit error-prone, as the slab core code expects a freeptr_t?

It just needs the space, should not matter otherwise. But may as well
just add the union and align the freeptr so it stop complaining on m68k.
Jens Axboe Nov. 19, 2024, 7:49 p.m. UTC | #12
On 11/19/24 12:44 PM, Jens Axboe wrote:
> On 11/19/24 12:41 PM, Geert Uytterhoeven wrote:
>> Hi Jens,
>>
>> On Tue, Nov 19, 2024 at 8:30?PM Jens Axboe <axboe@kernel.dk> wrote:
>>> On 11/19/24 12:25 PM, Geert Uytterhoeven wrote:
>>>> On Tue, Nov 19, 2024 at 8:10?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>> On 11/19/24 12:02 PM, Geert Uytterhoeven wrote:
>>>>>> On Tue, Nov 19, 2024 at 8:00?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>> On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
>>>>>>>> On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>>>>> On 11/19/24 08:02, Jens Axboe wrote:
>>>>>>>>>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
>>>>>>>>>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
>>>>>>>>>>>> Doesn't matter right now as there's still some bytes left for it, but
>>>>>>>>>>>> let's prepare for the io_kiocb potentially growing and add a specific
>>>>>>>>>>>> freeptr offset for it.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>
>>>>>>>>>>> This patch triggers:
>>>>>>>>>>>
>>>>>>>>>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
>>>>>>>>>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
>>>>>>>>>>> Stack from 00c63e5c:
>>>>>>>>>>>          00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
>>>>>>>>>>>          004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
>>>>>>>>>>>          00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
>>>>>>>>>>>          004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
>>>>>>>>>>>          00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
>>>>>>>>>>>          00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
>>>>>>>>>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
>>>>>>>>>>>   [<004ae21e>] panic+0xc4/0x252
>>>>>>>>>>>   [<000c6974>] __kmem_cache_create_args+0x216/0x26c
>>>>>>>>>>>   [<004a72c2>] strcpy+0x0/0x1c
>>>>>>>>>>>   [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>>>>>>   [<000c675e>] __kmem_cache_create_args+0x0/0x26c
>>>>>>>>>>>   [<004adb58>] memset+0x0/0x8c
>>>>>>>>>>>   [<0076f28a>] io_uring_init+0x4c/0xca
>>>>>>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>   [<000020e0>] do_one_initcall+0x32/0x192
>>>>>>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>   [<0000211c>] do_one_initcall+0x6e/0x192
>>>>>>>>>>>   [<004a72c2>] strcpy+0x0/0x1c
>>>>>>>>>>>   [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>>>>>>   [<000020ae>] do_one_initcall+0x0/0x192
>>>>>>>>>>>   [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
>>>>>>>>>>>   [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>   [<004b911a>] kernel_init+0x0/0xec
>>>>>>>>>>>   [<004b912e>] kernel_init+0x14/0xec
>>>>>>>>>>>   [<004b911a>] kernel_init+0x0/0xec
>>>>>>>>>>>   [<0000252c>] ret_from_kernel_thread+0xc/0x14
>>>>>>>>>>>
>>>>>>>>>>> when trying to boot the m68k:q800 machine in qemu.
>>>>>>>>>>>
>>>>>>>>>>> An added debug message in create_cache() shows the reason:
>>>>>>>>>>>
>>>>>>>>>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
>>>>>>>>>>>
>>>>>>>>>>> freeptr_offset would need to be 4-byte aligned but that is not the
>>>>>>>>>>> case on m68k.
>>>>>>>>>>
>>>>>>>>>> Why is ->work 2-byte aligned to begin with on m68k?!
>>>>>>>>>
>>>>>>>>> My understanding is that m68k does not align pointers.
>>>>>>>>
>>>>>>>> The minimum alignment for multi-byte integral values on m68k is
>>>>>>>> 2 bytes.
>>>>>>>>
>>>>>>>> See also the comment at
>>>>>>>> https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
>>>>>>>
>>>>>>> Maybe it's time we put m68k to bed? :-)
>>>>>>>
>>>>>>> We can add a forced alignment ->work to be 4 bytes, won't change
>>>>>>> anything on anything remotely current. But does feel pretty hacky to
>>>>>>> need to align based on some ancient thing.
>>>>>>
>>>>>> Why does freeptr_offset need to be 4-byte aligned?
>>>>>
>>>>> Didn't check, but it's slab/slub complaining using a 2-byte aligned
>>>>> address for the free pointer offset. It's explicitly checking:
>>>>>
>>>>>         /* If a custom freelist pointer is requested make sure it's sane. */
>>>>>         err = -EINVAL;
>>>>>         if (args->use_freeptr_offset &&
>>>>>             (args->freeptr_offset >= object_size ||
>>>>>              !(flags & SLAB_TYPESAFE_BY_RCU) ||
>>>>>              !IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
>>>>>                 goto out;
>>>>
>>>> It is not guaranteed that alignof(freeptr_t) >= sizeof(freeptr_t)
>>>> (free_ptr is sort of a long). If freeptr_offset must be a multiple of
>>>> 4 or 8 bytes,
>>>> the code that assigns it must make sure that is true.
>>>
>>> Right, this is what the email is about...
>>>
>>>> I guess this is the code in fs/file_table.c:
>>>>
>>>>     .freeptr_offset = offsetof(struct file, f_freeptr),
>>>>
>>>> which references:
>>>>
>>>>     include/linux/fs.h:           freeptr_t               f_freeptr;
>>>>
>>>> I guess the simplest solution is to add an __aligned(sizeof(freeptr_t))
>>>> (or __aligned(sizeof(long)) to the definition of freeptr_t:
>>>>
>>>>     include/linux/slab.h:typedef struct { unsigned long v; } freeptr_t;
>>>
>>> It's not, it's struct io_kiocb->work, as per the stack trace in this
>>> email.
>>
>> Sorry, I was falling out of thin air into this thread...
>>
>> linux-next/master:io_uring/io_uring.c:          .freeptr_offset =
>> offsetof(struct io_kiocb, work),
>> linux-next/master:io_uring/io_uring.c:          .use_freeptr_offset = true,
>>
>> Apparently io_kiocb.work is of type struct io_wq_work, not freeptr_t?
>> Isn't that a bit error-prone, as the slab core code expects a freeptr_t?
> 
> It just needs the space, should not matter otherwise. But may as well
> just add the union and align the freeptr so it stop complaining on m68k.

Ala the below, perhaps alignment takes care of itself then?


diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 593c10a02144..a83ec7f7849d 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -674,7 +674,11 @@ struct io_kiocb {
 	struct io_kiocb			*link;
 	/* custom credentials, valid IFF REQ_F_CREDS is set */
 	const struct cred		*creds;
-	struct io_wq_work		work;
+
+	union {
+		struct io_wq_work	work;
+		freeptr_t		freeptr;
+	};
 
 	struct {
 		u64			extra1;
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 73af59863300..86ac7df2a601 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3812,7 +3812,7 @@ static int __init io_uring_init(void)
 	struct kmem_cache_args kmem_args = {
 		.useroffset = offsetof(struct io_kiocb, cmd.data),
 		.usersize = sizeof_field(struct io_kiocb, cmd.data),
-		.freeptr_offset = offsetof(struct io_kiocb, work),
+		.freeptr_offset = offsetof(struct io_kiocb, freeptr),
 		.use_freeptr_offset = true,
 	};
Guenter Roeck Nov. 19, 2024, 9:46 p.m. UTC | #13
On 11/19/24 11:49, Jens Axboe wrote:
> On 11/19/24 12:44 PM, Jens Axboe wrote:
>> On 11/19/24 12:41 PM, Geert Uytterhoeven wrote:
>>> Hi Jens,
>>>
>>> On Tue, Nov 19, 2024 at 8:30?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>> On 11/19/24 12:25 PM, Geert Uytterhoeven wrote:
>>>>> On Tue, Nov 19, 2024 at 8:10?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>> On 11/19/24 12:02 PM, Geert Uytterhoeven wrote:
>>>>>>> On Tue, Nov 19, 2024 at 8:00?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>> On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
>>>>>>>>> On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>>>>>> On 11/19/24 08:02, Jens Axboe wrote:
>>>>>>>>>>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
>>>>>>>>>>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
>>>>>>>>>>>>> Doesn't matter right now as there's still some bytes left for it, but
>>>>>>>>>>>>> let's prepare for the io_kiocb potentially growing and add a specific
>>>>>>>>>>>>> freeptr offset for it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>
>>>>>>>>>>>> This patch triggers:
>>>>>>>>>>>>
>>>>>>>>>>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
>>>>>>>>>>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
>>>>>>>>>>>> Stack from 00c63e5c:
>>>>>>>>>>>>           00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
>>>>>>>>>>>>           004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
>>>>>>>>>>>>           00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
>>>>>>>>>>>>           004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
>>>>>>>>>>>>           00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
>>>>>>>>>>>>           00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
>>>>>>>>>>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
>>>>>>>>>>>>    [<004ae21e>] panic+0xc4/0x252
>>>>>>>>>>>>    [<000c6974>] __kmem_cache_create_args+0x216/0x26c
>>>>>>>>>>>>    [<004a72c2>] strcpy+0x0/0x1c
>>>>>>>>>>>>    [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>>>>>>>    [<000c675e>] __kmem_cache_create_args+0x0/0x26c
>>>>>>>>>>>>    [<004adb58>] memset+0x0/0x8c
>>>>>>>>>>>>    [<0076f28a>] io_uring_init+0x4c/0xca
>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>>    [<000020e0>] do_one_initcall+0x32/0x192
>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>>    [<0000211c>] do_one_initcall+0x6e/0x192
>>>>>>>>>>>>    [<004a72c2>] strcpy+0x0/0x1c
>>>>>>>>>>>>    [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>>>>>>>    [<000020ae>] do_one_initcall+0x0/0x192
>>>>>>>>>>>>    [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>>    [<004b911a>] kernel_init+0x0/0xec
>>>>>>>>>>>>    [<004b912e>] kernel_init+0x14/0xec
>>>>>>>>>>>>    [<004b911a>] kernel_init+0x0/0xec
>>>>>>>>>>>>    [<0000252c>] ret_from_kernel_thread+0xc/0x14
>>>>>>>>>>>>
>>>>>>>>>>>> when trying to boot the m68k:q800 machine in qemu.
>>>>>>>>>>>>
>>>>>>>>>>>> An added debug message in create_cache() shows the reason:
>>>>>>>>>>>>
>>>>>>>>>>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
>>>>>>>>>>>>
>>>>>>>>>>>> freeptr_offset would need to be 4-byte aligned but that is not the
>>>>>>>>>>>> case on m68k.
>>>>>>>>>>>
>>>>>>>>>>> Why is ->work 2-byte aligned to begin with on m68k?!
>>>>>>>>>>
>>>>>>>>>> My understanding is that m68k does not align pointers.
>>>>>>>>>
>>>>>>>>> The minimum alignment for multi-byte integral values on m68k is
>>>>>>>>> 2 bytes.
>>>>>>>>>
>>>>>>>>> See also the comment at
>>>>>>>>> https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
>>>>>>>>
>>>>>>>> Maybe it's time we put m68k to bed? :-)
>>>>>>>>
>>>>>>>> We can add a forced alignment ->work to be 4 bytes, won't change
>>>>>>>> anything on anything remotely current. But does feel pretty hacky to
>>>>>>>> need to align based on some ancient thing.
>>>>>>>
>>>>>>> Why does freeptr_offset need to be 4-byte aligned?
>>>>>>
>>>>>> Didn't check, but it's slab/slub complaining using a 2-byte aligned
>>>>>> address for the free pointer offset. It's explicitly checking:
>>>>>>
>>>>>>          /* If a custom freelist pointer is requested make sure it's sane. */
>>>>>>          err = -EINVAL;
>>>>>>          if (args->use_freeptr_offset &&
>>>>>>              (args->freeptr_offset >= object_size ||
>>>>>>               !(flags & SLAB_TYPESAFE_BY_RCU) ||
>>>>>>               !IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
>>>>>>                  goto out;
>>>>>
>>>>> It is not guaranteed that alignof(freeptr_t) >= sizeof(freeptr_t)
>>>>> (free_ptr is sort of a long). If freeptr_offset must be a multiple of
>>>>> 4 or 8 bytes,
>>>>> the code that assigns it must make sure that is true.
>>>>
>>>> Right, this is what the email is about...
>>>>
>>>>> I guess this is the code in fs/file_table.c:
>>>>>
>>>>>      .freeptr_offset = offsetof(struct file, f_freeptr),
>>>>>
>>>>> which references:
>>>>>
>>>>>      include/linux/fs.h:           freeptr_t               f_freeptr;
>>>>>
>>>>> I guess the simplest solution is to add an __aligned(sizeof(freeptr_t))
>>>>> (or __aligned(sizeof(long)) to the definition of freeptr_t:
>>>>>
>>>>>      include/linux/slab.h:typedef struct { unsigned long v; } freeptr_t;
>>>>
>>>> It's not, it's struct io_kiocb->work, as per the stack trace in this
>>>> email.
>>>
>>> Sorry, I was falling out of thin air into this thread...
>>>
>>> linux-next/master:io_uring/io_uring.c:          .freeptr_offset =
>>> offsetof(struct io_kiocb, work),
>>> linux-next/master:io_uring/io_uring.c:          .use_freeptr_offset = true,
>>>
>>> Apparently io_kiocb.work is of type struct io_wq_work, not freeptr_t?
>>> Isn't that a bit error-prone, as the slab core code expects a freeptr_t?
>>
>> It just needs the space, should not matter otherwise. But may as well
>> just add the union and align the freeptr so it stop complaining on m68k.
> 
> Ala the below, perhaps alignment takes care of itself then?
> 

No, that doesn't work (I tried), at least not on its own, because the pointer
is still unaligned on m68k.

Guenter

> 
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index 593c10a02144..a83ec7f7849d 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -674,7 +674,11 @@ struct io_kiocb {
>   	struct io_kiocb			*link;
>   	/* custom credentials, valid IFF REQ_F_CREDS is set */
>   	const struct cred		*creds;
> -	struct io_wq_work		work;
> +
> +	union {
> +		struct io_wq_work	work;
> +		freeptr_t		freeptr;
> +	};
>   
>   	struct {
>   		u64			extra1;
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 73af59863300..86ac7df2a601 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3812,7 +3812,7 @@ static int __init io_uring_init(void)
>   	struct kmem_cache_args kmem_args = {
>   		.useroffset = offsetof(struct io_kiocb, cmd.data),
>   		.usersize = sizeof_field(struct io_kiocb, cmd.data),
> -		.freeptr_offset = offsetof(struct io_kiocb, work),
> +		.freeptr_offset = offsetof(struct io_kiocb, freeptr),
>   		.use_freeptr_offset = true,
>   	};
>   
>
Jens Axboe Nov. 19, 2024, 10:30 p.m. UTC | #14
On 11/19/24 2:46 PM, Guenter Roeck wrote:
> On 11/19/24 11:49, Jens Axboe wrote:
>> On 11/19/24 12:44 PM, Jens Axboe wrote:
>>> On 11/19/24 12:41 PM, Geert Uytterhoeven wrote:
>>>> Hi Jens,
>>>>
>>>> On Tue, Nov 19, 2024 at 8:30?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>> On 11/19/24 12:25 PM, Geert Uytterhoeven wrote:
>>>>>> On Tue, Nov 19, 2024 at 8:10?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>> On 11/19/24 12:02 PM, Geert Uytterhoeven wrote:
>>>>>>>> On Tue, Nov 19, 2024 at 8:00?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>>> On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
>>>>>>>>>> On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>>>>>>> On 11/19/24 08:02, Jens Axboe wrote:
>>>>>>>>>>>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
>>>>>>>>>>>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
>>>>>>>>>>>>>> Doesn't matter right now as there's still some bytes left for it, but
>>>>>>>>>>>>>> let's prepare for the io_kiocb potentially growing and add a specific
>>>>>>>>>>>>>> freeptr offset for it.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>>
>>>>>>>>>>>>> This patch triggers:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
>>>>>>>>>>>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
>>>>>>>>>>>>> Stack from 00c63e5c:
>>>>>>>>>>>>>           00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
>>>>>>>>>>>>>           004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
>>>>>>>>>>>>>           00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
>>>>>>>>>>>>>           004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
>>>>>>>>>>>>>           00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
>>>>>>>>>>>>>           00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
>>>>>>>>>>>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
>>>>>>>>>>>>>    [<004ae21e>] panic+0xc4/0x252
>>>>>>>>>>>>>    [<000c6974>] __kmem_cache_create_args+0x216/0x26c
>>>>>>>>>>>>>    [<004a72c2>] strcpy+0x0/0x1c
>>>>>>>>>>>>>    [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>>>>>>>>    [<000c675e>] __kmem_cache_create_args+0x0/0x26c
>>>>>>>>>>>>>    [<004adb58>] memset+0x0/0x8c
>>>>>>>>>>>>>    [<0076f28a>] io_uring_init+0x4c/0xca
>>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>>>    [<000020e0>] do_one_initcall+0x32/0x192
>>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>>>    [<0000211c>] do_one_initcall+0x6e/0x192
>>>>>>>>>>>>>    [<004a72c2>] strcpy+0x0/0x1c
>>>>>>>>>>>>>    [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>>>>>>>>    [<000020ae>] do_one_initcall+0x0/0x192
>>>>>>>>>>>>>    [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
>>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>>>    [<004b911a>] kernel_init+0x0/0xec
>>>>>>>>>>>>>    [<004b912e>] kernel_init+0x14/0xec
>>>>>>>>>>>>>    [<004b911a>] kernel_init+0x0/0xec
>>>>>>>>>>>>>    [<0000252c>] ret_from_kernel_thread+0xc/0x14
>>>>>>>>>>>>>
>>>>>>>>>>>>> when trying to boot the m68k:q800 machine in qemu.
>>>>>>>>>>>>>
>>>>>>>>>>>>> An added debug message in create_cache() shows the reason:
>>>>>>>>>>>>>
>>>>>>>>>>>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
>>>>>>>>>>>>>
>>>>>>>>>>>>> freeptr_offset would need to be 4-byte aligned but that is not the
>>>>>>>>>>>>> case on m68k.
>>>>>>>>>>>>
>>>>>>>>>>>> Why is ->work 2-byte aligned to begin with on m68k?!
>>>>>>>>>>>
>>>>>>>>>>> My understanding is that m68k does not align pointers.
>>>>>>>>>>
>>>>>>>>>> The minimum alignment for multi-byte integral values on m68k is
>>>>>>>>>> 2 bytes.
>>>>>>>>>>
>>>>>>>>>> See also the comment at
>>>>>>>>>> https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
>>>>>>>>>
>>>>>>>>> Maybe it's time we put m68k to bed? :-)
>>>>>>>>>
>>>>>>>>> We can add a forced alignment ->work to be 4 bytes, won't change
>>>>>>>>> anything on anything remotely current. But does feel pretty hacky to
>>>>>>>>> need to align based on some ancient thing.
>>>>>>>>
>>>>>>>> Why does freeptr_offset need to be 4-byte aligned?
>>>>>>>
>>>>>>> Didn't check, but it's slab/slub complaining using a 2-byte aligned
>>>>>>> address for the free pointer offset. It's explicitly checking:
>>>>>>>
>>>>>>>          /* If a custom freelist pointer is requested make sure it's sane. */
>>>>>>>          err = -EINVAL;
>>>>>>>          if (args->use_freeptr_offset &&
>>>>>>>              (args->freeptr_offset >= object_size ||
>>>>>>>               !(flags & SLAB_TYPESAFE_BY_RCU) ||
>>>>>>>               !IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
>>>>>>>                  goto out;
>>>>>>
>>>>>> It is not guaranteed that alignof(freeptr_t) >= sizeof(freeptr_t)
>>>>>> (free_ptr is sort of a long). If freeptr_offset must be a multiple of
>>>>>> 4 or 8 bytes,
>>>>>> the code that assigns it must make sure that is true.
>>>>>
>>>>> Right, this is what the email is about...
>>>>>
>>>>>> I guess this is the code in fs/file_table.c:
>>>>>>
>>>>>>      .freeptr_offset = offsetof(struct file, f_freeptr),
>>>>>>
>>>>>> which references:
>>>>>>
>>>>>>      include/linux/fs.h:           freeptr_t               f_freeptr;
>>>>>>
>>>>>> I guess the simplest solution is to add an __aligned(sizeof(freeptr_t))
>>>>>> (or __aligned(sizeof(long)) to the definition of freeptr_t:
>>>>>>
>>>>>>      include/linux/slab.h:typedef struct { unsigned long v; } freeptr_t;
>>>>>
>>>>> It's not, it's struct io_kiocb->work, as per the stack trace in this
>>>>> email.
>>>>
>>>> Sorry, I was falling out of thin air into this thread...
>>>>
>>>> linux-next/master:io_uring/io_uring.c:          .freeptr_offset =
>>>> offsetof(struct io_kiocb, work),
>>>> linux-next/master:io_uring/io_uring.c:          .use_freeptr_offset = true,
>>>>
>>>> Apparently io_kiocb.work is of type struct io_wq_work, not freeptr_t?
>>>> Isn't that a bit error-prone, as the slab core code expects a freeptr_t?
>>>
>>> It just needs the space, should not matter otherwise. But may as well
>>> just add the union and align the freeptr so it stop complaining on m68k.
>>
>> Ala the below, perhaps alignment takes care of itself then?
>>
> 
> No, that doesn't work (I tried), at least not on its own, because the pointer
> is still unaligned on m68k.

Yeah we'll likely need to force it. The below should work, I pressume?
Feels pretty odd to have to align it to the size of it, when that should
naturally occur... Crusty legacy archs.

diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
index 593c10a02144..8ed9c6923668 100644
--- a/include/linux/io_uring_types.h
+++ b/include/linux/io_uring_types.h
@@ -674,7 +674,11 @@ struct io_kiocb {
 	struct io_kiocb			*link;
 	/* custom credentials, valid IFF REQ_F_CREDS is set */
 	const struct cred		*creds;
-	struct io_wq_work		work;
+
+	union {
+		struct io_wq_work	work;
+		freeptr_t		freeptr __aligned(sizeof(freeptr_t));
+	};
 
 	struct {
 		u64			extra1;
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 73af59863300..86ac7df2a601 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3812,7 +3812,7 @@ static int __init io_uring_init(void)
 	struct kmem_cache_args kmem_args = {
 		.useroffset = offsetof(struct io_kiocb, cmd.data),
 		.usersize = sizeof_field(struct io_kiocb, cmd.data),
-		.freeptr_offset = offsetof(struct io_kiocb, work),
+		.freeptr_offset = offsetof(struct io_kiocb, freeptr),
 		.use_freeptr_offset = true,
 	};
Guenter Roeck Nov. 20, 2024, 12:08 a.m. UTC | #15
On 11/19/24 14:30, Jens Axboe wrote:
> On 11/19/24 2:46 PM, Guenter Roeck wrote:
>> On 11/19/24 11:49, Jens Axboe wrote:
>>> On 11/19/24 12:44 PM, Jens Axboe wrote:
>>>> On 11/19/24 12:41 PM, Geert Uytterhoeven wrote:
>>>>> Hi Jens,
>>>>>
>>>>> On Tue, Nov 19, 2024 at 8:30?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>> On 11/19/24 12:25 PM, Geert Uytterhoeven wrote:
>>>>>>> On Tue, Nov 19, 2024 at 8:10?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>> On 11/19/24 12:02 PM, Geert Uytterhoeven wrote:
>>>>>>>>> On Tue, Nov 19, 2024 at 8:00?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>>>> On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
>>>>>>>>>>> On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>>>>>>>> On 11/19/24 08:02, Jens Axboe wrote:
>>>>>>>>>>>>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
>>>>>>>>>>>>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
>>>>>>>>>>>>>>> Doesn't matter right now as there's still some bytes left for it, but
>>>>>>>>>>>>>>> let's prepare for the io_kiocb potentially growing and add a specific
>>>>>>>>>>>>>>> freeptr offset for it.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This patch triggers:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
>>>>>>>>>>>>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
>>>>>>>>>>>>>> Stack from 00c63e5c:
>>>>>>>>>>>>>>            00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
>>>>>>>>>>>>>>            004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
>>>>>>>>>>>>>>            00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
>>>>>>>>>>>>>>            004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
>>>>>>>>>>>>>>            00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
>>>>>>>>>>>>>>            00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
>>>>>>>>>>>>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
>>>>>>>>>>>>>>     [<004ae21e>] panic+0xc4/0x252
>>>>>>>>>>>>>>     [<000c6974>] __kmem_cache_create_args+0x216/0x26c
>>>>>>>>>>>>>>     [<004a72c2>] strcpy+0x0/0x1c
>>>>>>>>>>>>>>     [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>>>>>>>>>     [<000c675e>] __kmem_cache_create_args+0x0/0x26c
>>>>>>>>>>>>>>     [<004adb58>] memset+0x0/0x8c
>>>>>>>>>>>>>>     [<0076f28a>] io_uring_init+0x4c/0xca
>>>>>>>>>>>>>>     [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>>>>     [<000020e0>] do_one_initcall+0x32/0x192
>>>>>>>>>>>>>>     [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>>>>     [<0000211c>] do_one_initcall+0x6e/0x192
>>>>>>>>>>>>>>     [<004a72c2>] strcpy+0x0/0x1c
>>>>>>>>>>>>>>     [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>>>>>>>>>     [<000020ae>] do_one_initcall+0x0/0x192
>>>>>>>>>>>>>>     [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
>>>>>>>>>>>>>>     [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>>>>     [<004b911a>] kernel_init+0x0/0xec
>>>>>>>>>>>>>>     [<004b912e>] kernel_init+0x14/0xec
>>>>>>>>>>>>>>     [<004b911a>] kernel_init+0x0/0xec
>>>>>>>>>>>>>>     [<0000252c>] ret_from_kernel_thread+0xc/0x14
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> when trying to boot the m68k:q800 machine in qemu.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> An added debug message in create_cache() shows the reason:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> freeptr_offset would need to be 4-byte aligned but that is not the
>>>>>>>>>>>>>> case on m68k.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Why is ->work 2-byte aligned to begin with on m68k?!
>>>>>>>>>>>>
>>>>>>>>>>>> My understanding is that m68k does not align pointers.
>>>>>>>>>>>
>>>>>>>>>>> The minimum alignment for multi-byte integral values on m68k is
>>>>>>>>>>> 2 bytes.
>>>>>>>>>>>
>>>>>>>>>>> See also the comment at
>>>>>>>>>>> https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
>>>>>>>>>>
>>>>>>>>>> Maybe it's time we put m68k to bed? :-)
>>>>>>>>>>
>>>>>>>>>> We can add a forced alignment ->work to be 4 bytes, won't change
>>>>>>>>>> anything on anything remotely current. But does feel pretty hacky to
>>>>>>>>>> need to align based on some ancient thing.
>>>>>>>>>
>>>>>>>>> Why does freeptr_offset need to be 4-byte aligned?
>>>>>>>>
>>>>>>>> Didn't check, but it's slab/slub complaining using a 2-byte aligned
>>>>>>>> address for the free pointer offset. It's explicitly checking:
>>>>>>>>
>>>>>>>>           /* If a custom freelist pointer is requested make sure it's sane. */
>>>>>>>>           err = -EINVAL;
>>>>>>>>           if (args->use_freeptr_offset &&
>>>>>>>>               (args->freeptr_offset >= object_size ||
>>>>>>>>                !(flags & SLAB_TYPESAFE_BY_RCU) ||
>>>>>>>>                !IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
>>>>>>>>                   goto out;
>>>>>>>
>>>>>>> It is not guaranteed that alignof(freeptr_t) >= sizeof(freeptr_t)
>>>>>>> (free_ptr is sort of a long). If freeptr_offset must be a multiple of
>>>>>>> 4 or 8 bytes,
>>>>>>> the code that assigns it must make sure that is true.
>>>>>>
>>>>>> Right, this is what the email is about...
>>>>>>
>>>>>>> I guess this is the code in fs/file_table.c:
>>>>>>>
>>>>>>>       .freeptr_offset = offsetof(struct file, f_freeptr),
>>>>>>>
>>>>>>> which references:
>>>>>>>
>>>>>>>       include/linux/fs.h:           freeptr_t               f_freeptr;
>>>>>>>
>>>>>>> I guess the simplest solution is to add an __aligned(sizeof(freeptr_t))
>>>>>>> (or __aligned(sizeof(long)) to the definition of freeptr_t:
>>>>>>>
>>>>>>>       include/linux/slab.h:typedef struct { unsigned long v; } freeptr_t;
>>>>>>
>>>>>> It's not, it's struct io_kiocb->work, as per the stack trace in this
>>>>>> email.
>>>>>
>>>>> Sorry, I was falling out of thin air into this thread...
>>>>>
>>>>> linux-next/master:io_uring/io_uring.c:          .freeptr_offset =
>>>>> offsetof(struct io_kiocb, work),
>>>>> linux-next/master:io_uring/io_uring.c:          .use_freeptr_offset = true,
>>>>>
>>>>> Apparently io_kiocb.work is of type struct io_wq_work, not freeptr_t?
>>>>> Isn't that a bit error-prone, as the slab core code expects a freeptr_t?
>>>>
>>>> It just needs the space, should not matter otherwise. But may as well
>>>> just add the union and align the freeptr so it stop complaining on m68k.
>>>
>>> Ala the below, perhaps alignment takes care of itself then?
>>>
>>
>> No, that doesn't work (I tried), at least not on its own, because the pointer
>> is still unaligned on m68k.
> 
> Yeah we'll likely need to force it. The below should work, I pressume?
> Feels pretty odd to have to align it to the size of it, when that should
> naturally occur... Crusty legacy archs.
> 

Yes, that works. Feel free to add

Tested-by: Guenter Roeck <linux@roeck-us.net>

to an official patch.

Thanks,
Guenter

> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index 593c10a02144..8ed9c6923668 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -674,7 +674,11 @@ struct io_kiocb {
>   	struct io_kiocb			*link;
>   	/* custom credentials, valid IFF REQ_F_CREDS is set */
>   	const struct cred		*creds;
> -	struct io_wq_work		work;
> +
> +	union {
> +		struct io_wq_work	work;
> +		freeptr_t		freeptr __aligned(sizeof(freeptr_t));
> +	};
>   
>   	struct {
>   		u64			extra1;
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 73af59863300..86ac7df2a601 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3812,7 +3812,7 @@ static int __init io_uring_init(void)
>   	struct kmem_cache_args kmem_args = {
>   		.useroffset = offsetof(struct io_kiocb, cmd.data),
>   		.usersize = sizeof_field(struct io_kiocb, cmd.data),
> -		.freeptr_offset = offsetof(struct io_kiocb, work),
> +		.freeptr_offset = offsetof(struct io_kiocb, freeptr),
>   		.use_freeptr_offset = true,
>   	};
>
Jens Axboe Nov. 20, 2024, 1:58 a.m. UTC | #16
On 11/19/24 5:08 PM, Guenter Roeck wrote:
> On 11/19/24 14:30, Jens Axboe wrote:
>> On 11/19/24 2:46 PM, Guenter Roeck wrote:
>>> On 11/19/24 11:49, Jens Axboe wrote:
>>>> On 11/19/24 12:44 PM, Jens Axboe wrote:
>>>>> On 11/19/24 12:41 PM, Geert Uytterhoeven wrote:
>>>>>> Hi Jens,
>>>>>>
>>>>>> On Tue, Nov 19, 2024 at 8:30?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>> On 11/19/24 12:25 PM, Geert Uytterhoeven wrote:
>>>>>>>> On Tue, Nov 19, 2024 at 8:10?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>>> On 11/19/24 12:02 PM, Geert Uytterhoeven wrote:
>>>>>>>>>> On Tue, Nov 19, 2024 at 8:00?PM Jens Axboe <axboe@kernel.dk> wrote:
>>>>>>>>>>> On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
>>>>>>>>>>>> On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
>>>>>>>>>>>>> On 11/19/24 08:02, Jens Axboe wrote:
>>>>>>>>>>>>>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
>>>>>>>>>>>>>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
>>>>>>>>>>>>>>>> Doesn't matter right now as there's still some bytes left for it, but
>>>>>>>>>>>>>>>> let's prepare for the io_kiocb potentially growing and add a specific
>>>>>>>>>>>>>>>> freeptr offset for it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> This patch triggers:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
>>>>>>>>>>>>>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
>>>>>>>>>>>>>>> Stack from 00c63e5c:
>>>>>>>>>>>>>>>            00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
>>>>>>>>>>>>>>>            004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
>>>>>>>>>>>>>>>            00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
>>>>>>>>>>>>>>>            004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
>>>>>>>>>>>>>>>            00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
>>>>>>>>>>>>>>>            00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
>>>>>>>>>>>>>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
>>>>>>>>>>>>>>>     [<004ae21e>] panic+0xc4/0x252
>>>>>>>>>>>>>>>     [<000c6974>] __kmem_cache_create_args+0x216/0x26c
>>>>>>>>>>>>>>>     [<004a72c2>] strcpy+0x0/0x1c
>>>>>>>>>>>>>>>     [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>>>>>>>>>>     [<000c675e>] __kmem_cache_create_args+0x0/0x26c
>>>>>>>>>>>>>>>     [<004adb58>] memset+0x0/0x8c
>>>>>>>>>>>>>>>     [<0076f28a>] io_uring_init+0x4c/0xca
>>>>>>>>>>>>>>>     [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>>>>>     [<000020e0>] do_one_initcall+0x32/0x192
>>>>>>>>>>>>>>>     [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>>>>>     [<0000211c>] do_one_initcall+0x6e/0x192
>>>>>>>>>>>>>>>     [<004a72c2>] strcpy+0x0/0x1c
>>>>>>>>>>>>>>>     [<0002cb62>] parse_args+0x0/0x1f2
>>>>>>>>>>>>>>>     [<000020ae>] do_one_initcall+0x0/0x192
>>>>>>>>>>>>>>>     [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
>>>>>>>>>>>>>>>     [<0076f23e>] io_uring_init+0x0/0xca
>>>>>>>>>>>>>>>     [<004b911a>] kernel_init+0x0/0xec
>>>>>>>>>>>>>>>     [<004b912e>] kernel_init+0x14/0xec
>>>>>>>>>>>>>>>     [<004b911a>] kernel_init+0x0/0xec
>>>>>>>>>>>>>>>     [<0000252c>] ret_from_kernel_thread+0xc/0x14
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> when trying to boot the m68k:q800 machine in qemu.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> An added debug message in create_cache() shows the reason:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> freeptr_offset would need to be 4-byte aligned but that is not the
>>>>>>>>>>>>>>> case on m68k.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Why is ->work 2-byte aligned to begin with on m68k?!
>>>>>>>>>>>>>
>>>>>>>>>>>>> My understanding is that m68k does not align pointers.
>>>>>>>>>>>>
>>>>>>>>>>>> The minimum alignment for multi-byte integral values on m68k is
>>>>>>>>>>>> 2 bytes.
>>>>>>>>>>>>
>>>>>>>>>>>> See also the comment at
>>>>>>>>>>>> https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
>>>>>>>>>>>
>>>>>>>>>>> Maybe it's time we put m68k to bed? :-)
>>>>>>>>>>>
>>>>>>>>>>> We can add a forced alignment ->work to be 4 bytes, won't change
>>>>>>>>>>> anything on anything remotely current. But does feel pretty hacky to
>>>>>>>>>>> need to align based on some ancient thing.
>>>>>>>>>>
>>>>>>>>>> Why does freeptr_offset need to be 4-byte aligned?
>>>>>>>>>
>>>>>>>>> Didn't check, but it's slab/slub complaining using a 2-byte aligned
>>>>>>>>> address for the free pointer offset. It's explicitly checking:
>>>>>>>>>
>>>>>>>>>           /* If a custom freelist pointer is requested make sure it's sane. */
>>>>>>>>>           err = -EINVAL;
>>>>>>>>>           if (args->use_freeptr_offset &&
>>>>>>>>>               (args->freeptr_offset >= object_size ||
>>>>>>>>>                !(flags & SLAB_TYPESAFE_BY_RCU) ||
>>>>>>>>>                !IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
>>>>>>>>>                   goto out;
>>>>>>>>
>>>>>>>> It is not guaranteed that alignof(freeptr_t) >= sizeof(freeptr_t)
>>>>>>>> (free_ptr is sort of a long). If freeptr_offset must be a multiple of
>>>>>>>> 4 or 8 bytes,
>>>>>>>> the code that assigns it must make sure that is true.
>>>>>>>
>>>>>>> Right, this is what the email is about...
>>>>>>>
>>>>>>>> I guess this is the code in fs/file_table.c:
>>>>>>>>
>>>>>>>>       .freeptr_offset = offsetof(struct file, f_freeptr),
>>>>>>>>
>>>>>>>> which references:
>>>>>>>>
>>>>>>>>       include/linux/fs.h:           freeptr_t               f_freeptr;
>>>>>>>>
>>>>>>>> I guess the simplest solution is to add an __aligned(sizeof(freeptr_t))
>>>>>>>> (or __aligned(sizeof(long)) to the definition of freeptr_t:
>>>>>>>>
>>>>>>>>       include/linux/slab.h:typedef struct { unsigned long v; } freeptr_t;
>>>>>>>
>>>>>>> It's not, it's struct io_kiocb->work, as per the stack trace in this
>>>>>>> email.
>>>>>>
>>>>>> Sorry, I was falling out of thin air into this thread...
>>>>>>
>>>>>> linux-next/master:io_uring/io_uring.c:          .freeptr_offset =
>>>>>> offsetof(struct io_kiocb, work),
>>>>>> linux-next/master:io_uring/io_uring.c:          .use_freeptr_offset = true,
>>>>>>
>>>>>> Apparently io_kiocb.work is of type struct io_wq_work, not freeptr_t?
>>>>>> Isn't that a bit error-prone, as the slab core code expects a freeptr_t?
>>>>>
>>>>> It just needs the space, should not matter otherwise. But may as well
>>>>> just add the union and align the freeptr so it stop complaining on m68k.
>>>>
>>>> Ala the below, perhaps alignment takes care of itself then?
>>>>
>>>
>>> No, that doesn't work (I tried), at least not on its own, because the pointer
>>> is still unaligned on m68k.
>>
>> Yeah we'll likely need to force it. The below should work, I pressume?
>> Feels pretty odd to have to align it to the size of it, when that should
>> naturally occur... Crusty legacy archs.
>>
> 
> Yes, that works. Feel free to add
> 
> Tested-by: Guenter Roeck <linux@roeck-us.net>
> 
> to an official patch.

Thanks for testing, will add that and send it out (and queue it up for
later this merge window).
Geert Uytterhoeven Nov. 20, 2024, 8:19 a.m. UTC | #17
Hi Jens,

CC Christian (who added the check)
CC Vlastimil (who suggested the check)

On Tue, Nov 19, 2024 at 11:30 PM Jens Axboe <axboe@kernel.dk> wrote:
> On 11/19/24 2:46 PM, Guenter Roeck wrote:
> > On 11/19/24 11:49, Jens Axboe wrote:
> >> On 11/19/24 12:44 PM, Jens Axboe wrote:
> >>>> On Tue, Nov 19, 2024 at 8:30?PM Jens Axboe <axboe@kernel.dk> wrote:
> >>>>> On 11/19/24 12:25 PM, Geert Uytterhoeven wrote:
> >>>>>> On Tue, Nov 19, 2024 at 8:10?PM Jens Axboe <axboe@kernel.dk> wrote:
> >>>>>>> On 11/19/24 12:02 PM, Geert Uytterhoeven wrote:
> >>>>>>>> On Tue, Nov 19, 2024 at 8:00?PM Jens Axboe <axboe@kernel.dk> wrote:
> >>>>>>>>> On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
> >>>>>>>>>> On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
> >>>>>>>>>>> On 11/19/24 08:02, Jens Axboe wrote:
> >>>>>>>>>>>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
> >>>>>>>>>>>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
> >>>>>>>>>>>>>> Doesn't matter right now as there's still some bytes left for it, but
> >>>>>>>>>>>>>> let's prepare for the io_kiocb potentially growing and add a specific
> >>>>>>>>>>>>>> freeptr offset for it.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This patch triggers:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
> >>>>>>>>>>>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
> >>>>>>>>>>>>> Stack from 00c63e5c:
> >>>>>>>>>>>>>           00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
> >>>>>>>>>>>>>           004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
> >>>>>>>>>>>>>           00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
> >>>>>>>>>>>>>           004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
> >>>>>>>>>>>>>           00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
> >>>>>>>>>>>>>           00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
> >>>>>>>>>>>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
> >>>>>>>>>>>>>    [<004ae21e>] panic+0xc4/0x252
> >>>>>>>>>>>>>    [<000c6974>] __kmem_cache_create_args+0x216/0x26c
> >>>>>>>>>>>>>    [<004a72c2>] strcpy+0x0/0x1c
> >>>>>>>>>>>>>    [<0002cb62>] parse_args+0x0/0x1f2
> >>>>>>>>>>>>>    [<000c675e>] __kmem_cache_create_args+0x0/0x26c
> >>>>>>>>>>>>>    [<004adb58>] memset+0x0/0x8c
> >>>>>>>>>>>>>    [<0076f28a>] io_uring_init+0x4c/0xca
> >>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
> >>>>>>>>>>>>>    [<000020e0>] do_one_initcall+0x32/0x192
> >>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
> >>>>>>>>>>>>>    [<0000211c>] do_one_initcall+0x6e/0x192
> >>>>>>>>>>>>>    [<004a72c2>] strcpy+0x0/0x1c
> >>>>>>>>>>>>>    [<0002cb62>] parse_args+0x0/0x1f2
> >>>>>>>>>>>>>    [<000020ae>] do_one_initcall+0x0/0x192
> >>>>>>>>>>>>>    [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
> >>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
> >>>>>>>>>>>>>    [<004b911a>] kernel_init+0x0/0xec
> >>>>>>>>>>>>>    [<004b912e>] kernel_init+0x14/0xec
> >>>>>>>>>>>>>    [<004b911a>] kernel_init+0x0/0xec
> >>>>>>>>>>>>>    [<0000252c>] ret_from_kernel_thread+0xc/0x14
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> when trying to boot the m68k:q800 machine in qemu.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> An added debug message in create_cache() shows the reason:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> freeptr_offset would need to be 4-byte aligned but that is not the
> >>>>>>>>>>>>> case on m68k.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Why is ->work 2-byte aligned to begin with on m68k?!
> >>>>>>>>>>>
> >>>>>>>>>>> My understanding is that m68k does not align pointers.
> >>>>>>>>>>
> >>>>>>>>>> The minimum alignment for multi-byte integral values on m68k is
> >>>>>>>>>> 2 bytes.
> >>>>>>>>>>
> >>>>>>>>>> See also the comment at
> >>>>>>>>>> https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
> >>>>>>>>>
> >>>>>>>>> Maybe it's time we put m68k to bed? :-)
> >>>>>>>>>
> >>>>>>>>> We can add a forced alignment ->work to be 4 bytes, won't change
> >>>>>>>>> anything on anything remotely current. But does feel pretty hacky to
> >>>>>>>>> need to align based on some ancient thing.
> >>>>>>>>
> >>>>>>>> Why does freeptr_offset need to be 4-byte aligned?
> >>>>>>>
> >>>>>>> Didn't check, but it's slab/slub complaining using a 2-byte aligned
> >>>>>>> address for the free pointer offset. It's explicitly checking:
> >>>>>>>
> >>>>>>>          /* If a custom freelist pointer is requested make sure it's sane. */
> >>>>>>>          err = -EINVAL;
> >>>>>>>          if (args->use_freeptr_offset &&
> >>>>>>>              (args->freeptr_offset >= object_size ||
> >>>>>>>               !(flags & SLAB_TYPESAFE_BY_RCU) ||
> >>>>>>>               !IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
                                                          ^^^^^^

> >>>>>>>                  goto out;
> >>>>>>
> >>>>>> It is not guaranteed that alignof(freeptr_t) >= sizeof(freeptr_t)
> >>>>>> (free_ptr is sort of a long). If freeptr_offset must be a multiple of
> >>>>>> 4 or 8 bytes,
> >>>>>> the code that assigns it must make sure that is true.
> >>>>>
> >>>>> Right, this is what the email is about...
> >>>>>
> >>>>>> I guess this is the code in fs/file_table.c:
> >>>>>>
> >>>>>>      .freeptr_offset = offsetof(struct file, f_freeptr),
> >>>>>>
> >>>>>> which references:
> >>>>>>
> >>>>>>      include/linux/fs.h:           freeptr_t               f_freeptr;
> >>>>>>
> >>>>>> I guess the simplest solution is to add an __aligned(sizeof(freeptr_t))
> >>>>>> (or __aligned(sizeof(long)) to the definition of freeptr_t:
> >>>>>>
> >>>>>>      include/linux/slab.h:typedef struct { unsigned long v; } freeptr_t;
> >>>>>
> >>>>> It's not, it's struct io_kiocb->work, as per the stack trace in this
> >>>>> email.
> >>>>
> >>>> Sorry, I was falling out of thin air into this thread...
> >>>>
> >>>> linux-next/master:io_uring/io_uring.c:          .freeptr_offset =
> >>>> offsetof(struct io_kiocb, work),
> >>>> linux-next/master:io_uring/io_uring.c:          .use_freeptr_offset = true,
> >>>>
> >>>> Apparently io_kiocb.work is of type struct io_wq_work, not freeptr_t?
> >>>> Isn't that a bit error-prone, as the slab core code expects a freeptr_t?
> >>>
> >>> It just needs the space, should not matter otherwise. But may as well
> >>> just add the union and align the freeptr so it stop complaining on m68k.
> >>
> >> Ala the below, perhaps alignment takes care of itself then?
> >
> > No, that doesn't work (I tried), at least not on its own, because the pointer
> > is still unaligned on m68k.
>
> Yeah we'll likely need to force it. The below should work, I pressume?
> Feels pretty odd to have to align it to the size of it, when that should
> naturally occur... Crusty legacy archs.
>
> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> index 593c10a02144..8ed9c6923668 100644
> --- a/include/linux/io_uring_types.h
> +++ b/include/linux/io_uring_types.h
> @@ -674,7 +674,11 @@ struct io_kiocb {
>         struct io_kiocb                 *link;
>         /* custom credentials, valid IFF REQ_F_CREDS is set */
>         const struct cred               *creds;
> -       struct io_wq_work               work;
> +
> +       union {
> +               struct io_wq_work       work;
> +               freeptr_t               freeptr __aligned(sizeof(freeptr_t));

I'd rather add the __aligned() to the definition of freeptr_t, so it
applies to all (future) users.

But my main question stays: why is the slab code checking
IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t)?
Perhaps that was just intended to be __alignof__ instead of sizeof()?

> +       };
>
>         struct {
>                 u64                     extra1;
> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
> index 73af59863300..86ac7df2a601 100644
> --- a/io_uring/io_uring.c
> +++ b/io_uring/io_uring.c
> @@ -3812,7 +3812,7 @@ static int __init io_uring_init(void)
>         struct kmem_cache_args kmem_args = {
>                 .useroffset = offsetof(struct io_kiocb, cmd.data),
>                 .usersize = sizeof_field(struct io_kiocb, cmd.data),
> -               .freeptr_offset = offsetof(struct io_kiocb, work),
> +               .freeptr_offset = offsetof(struct io_kiocb, freeptr),
>                 .use_freeptr_offset = true,
>         };

Gr{oetje,eeting}s,

                        Geert
Vlastimil Babka Nov. 20, 2024, 8:47 a.m. UTC | #18
On 11/20/24 09:19, Geert Uytterhoeven wrote:
> Hi Jens,
> 
> CC Christian (who added the check)
> CC Vlastimil (who suggested the check)
> 
> On Tue, Nov 19, 2024 at 11:30 PM Jens Axboe <axboe@kernel.dk> wrote:
>> On 11/19/24 2:46 PM, Guenter Roeck wrote:
>> > On 11/19/24 11:49, Jens Axboe wrote:
>> >> On 11/19/24 12:44 PM, Jens Axboe wrote:
>> >>>> On Tue, Nov 19, 2024 at 8:30?PM Jens Axboe <axboe@kernel.dk> wrote:
>> >>>>> On 11/19/24 12:25 PM, Geert Uytterhoeven wrote:
>> >>>>>> On Tue, Nov 19, 2024 at 8:10?PM Jens Axboe <axboe@kernel.dk> wrote:
>> >>>>>>> On 11/19/24 12:02 PM, Geert Uytterhoeven wrote:
>> >>>>>>>> On Tue, Nov 19, 2024 at 8:00?PM Jens Axboe <axboe@kernel.dk> wrote:
>> >>>>>>>>> On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
>> >>>>>>>>>> On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
>> >>>>>>>>>>> On 11/19/24 08:02, Jens Axboe wrote:
>> >>>>>>>>>>>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
>> >>>>>>>>>>>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
>> >>>>>>>>>>>>>> Doesn't matter right now as there's still some bytes left for it, but
>> >>>>>>>>>>>>>> let's prepare for the io_kiocb potentially growing and add a specific
>> >>>>>>>>>>>>>> freeptr offset for it.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> This patch triggers:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
>> >>>>>>>>>>>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
>> >>>>>>>>>>>>> Stack from 00c63e5c:
>> >>>>>>>>>>>>>           00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
>> >>>>>>>>>>>>>           004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
>> >>>>>>>>>>>>>           00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
>> >>>>>>>>>>>>>           004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
>> >>>>>>>>>>>>>           00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
>> >>>>>>>>>>>>>           00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
>> >>>>>>>>>>>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
>> >>>>>>>>>>>>>    [<004ae21e>] panic+0xc4/0x252
>> >>>>>>>>>>>>>    [<000c6974>] __kmem_cache_create_args+0x216/0x26c
>> >>>>>>>>>>>>>    [<004a72c2>] strcpy+0x0/0x1c
>> >>>>>>>>>>>>>    [<0002cb62>] parse_args+0x0/0x1f2
>> >>>>>>>>>>>>>    [<000c675e>] __kmem_cache_create_args+0x0/0x26c
>> >>>>>>>>>>>>>    [<004adb58>] memset+0x0/0x8c
>> >>>>>>>>>>>>>    [<0076f28a>] io_uring_init+0x4c/0xca
>> >>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
>> >>>>>>>>>>>>>    [<000020e0>] do_one_initcall+0x32/0x192
>> >>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
>> >>>>>>>>>>>>>    [<0000211c>] do_one_initcall+0x6e/0x192
>> >>>>>>>>>>>>>    [<004a72c2>] strcpy+0x0/0x1c
>> >>>>>>>>>>>>>    [<0002cb62>] parse_args+0x0/0x1f2
>> >>>>>>>>>>>>>    [<000020ae>] do_one_initcall+0x0/0x192
>> >>>>>>>>>>>>>    [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
>> >>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
>> >>>>>>>>>>>>>    [<004b911a>] kernel_init+0x0/0xec
>> >>>>>>>>>>>>>    [<004b912e>] kernel_init+0x14/0xec
>> >>>>>>>>>>>>>    [<004b911a>] kernel_init+0x0/0xec
>> >>>>>>>>>>>>>    [<0000252c>] ret_from_kernel_thread+0xc/0x14
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> when trying to boot the m68k:q800 machine in qemu.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> An added debug message in create_cache() shows the reason:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> freeptr_offset would need to be 4-byte aligned but that is not the
>> >>>>>>>>>>>>> case on m68k.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Why is ->work 2-byte aligned to begin with on m68k?!
>> >>>>>>>>>>>
>> >>>>>>>>>>> My understanding is that m68k does not align pointers.
>> >>>>>>>>>>
>> >>>>>>>>>> The minimum alignment for multi-byte integral values on m68k is
>> >>>>>>>>>> 2 bytes.
>> >>>>>>>>>>
>> >>>>>>>>>> See also the comment at
>> >>>>>>>>>> https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
>> >>>>>>>>>
>> >>>>>>>>> Maybe it's time we put m68k to bed? :-)
>> >>>>>>>>>
>> >>>>>>>>> We can add a forced alignment ->work to be 4 bytes, won't change
>> >>>>>>>>> anything on anything remotely current. But does feel pretty hacky to
>> >>>>>>>>> need to align based on some ancient thing.
>> >>>>>>>>
>> >>>>>>>> Why does freeptr_offset need to be 4-byte aligned?
>> >>>>>>>
>> >>>>>>> Didn't check, but it's slab/slub complaining using a 2-byte aligned
>> >>>>>>> address for the free pointer offset. It's explicitly checking:
>> >>>>>>>
>> >>>>>>>          /* If a custom freelist pointer is requested make sure it's sane. */
>> >>>>>>>          err = -EINVAL;
>> >>>>>>>          if (args->use_freeptr_offset &&
>> >>>>>>>              (args->freeptr_offset >= object_size ||
>> >>>>>>>               !(flags & SLAB_TYPESAFE_BY_RCU) ||
>> >>>>>>>               !IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
>                                                           ^^^^^^
> 
>> >>>>>>>                  goto out;
>> >>>>>>
>> >>>>>> It is not guaranteed that alignof(freeptr_t) >= sizeof(freeptr_t)
>> >>>>>> (free_ptr is sort of a long). If freeptr_offset must be a multiple of
>> >>>>>> 4 or 8 bytes,
>> >>>>>> the code that assigns it must make sure that is true.
>> >>>>>
>> >>>>> Right, this is what the email is about...
>> >>>>>
>> >>>>>> I guess this is the code in fs/file_table.c:
>> >>>>>>
>> >>>>>>      .freeptr_offset = offsetof(struct file, f_freeptr),
>> >>>>>>
>> >>>>>> which references:
>> >>>>>>
>> >>>>>>      include/linux/fs.h:           freeptr_t               f_freeptr;
>> >>>>>>
>> >>>>>> I guess the simplest solution is to add an __aligned(sizeof(freeptr_t))
>> >>>>>> (or __aligned(sizeof(long)) to the definition of freeptr_t:
>> >>>>>>
>> >>>>>>      include/linux/slab.h:typedef struct { unsigned long v; } freeptr_t;
>> >>>>>
>> >>>>> It's not, it's struct io_kiocb->work, as per the stack trace in this
>> >>>>> email.
>> >>>>
>> >>>> Sorry, I was falling out of thin air into this thread...
>> >>>>
>> >>>> linux-next/master:io_uring/io_uring.c:          .freeptr_offset =
>> >>>> offsetof(struct io_kiocb, work),
>> >>>> linux-next/master:io_uring/io_uring.c:          .use_freeptr_offset = true,
>> >>>>
>> >>>> Apparently io_kiocb.work is of type struct io_wq_work, not freeptr_t?
>> >>>> Isn't that a bit error-prone, as the slab core code expects a freeptr_t?
>> >>>
>> >>> It just needs the space, should not matter otherwise. But may as well
>> >>> just add the union and align the freeptr so it stop complaining on m68k.
>> >>
>> >> Ala the below, perhaps alignment takes care of itself then?
>> >
>> > No, that doesn't work (I tried), at least not on its own, because the pointer
>> > is still unaligned on m68k.
>>
>> Yeah we'll likely need to force it. The below should work, I pressume?
>> Feels pretty odd to have to align it to the size of it, when that should
>> naturally occur... Crusty legacy archs.
>>
>> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>> index 593c10a02144..8ed9c6923668 100644
>> --- a/include/linux/io_uring_types.h
>> +++ b/include/linux/io_uring_types.h
>> @@ -674,7 +674,11 @@ struct io_kiocb {
>>         struct io_kiocb                 *link;
>>         /* custom credentials, valid IFF REQ_F_CREDS is set */
>>         const struct cred               *creds;
>> -       struct io_wq_work               work;
>> +
>> +       union {
>> +               struct io_wq_work       work;
>> +               freeptr_t               freeptr __aligned(sizeof(freeptr_t));
> 
> I'd rather add the __aligned() to the definition of freeptr_t, so it
> applies to all (future) users.
> 
> But my main question stays: why is the slab code checking
> IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t)?

I believe it's to match how SLUB normally calculates the offset if no
explicit one is given, in calculate_sizes():

s->offset = ALIGN_DOWN(s->object_size / 2, sizeof(void *));

Yes there's a sizeof(void *) because freepointer used to be just that and we
forgot to update this place when freepointer_t was introduced (by Jann in
44f6a42d49350) for handling CONFIG_SLAB_FREELIST_HARDENED. In
get_freepointer() you can see how there's a cast to a pointer eventually.

Does m68k have different alignment for pointer and unsigned long or both are
2 bytes? Or any other arch, i.e. should get_freepointer be a union with
unsigned long and void * instead? (or it doesn't matter?)

> Perhaps that was just intended to be __alignof__ instead of sizeof()?

Would it do the right thing everywhere, given the explanation above?

Thanks,
Vlastimil

>> +       };
>>
>>         struct {
>>                 u64                     extra1;
>> diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
>> index 73af59863300..86ac7df2a601 100644
>> --- a/io_uring/io_uring.c
>> +++ b/io_uring/io_uring.c
>> @@ -3812,7 +3812,7 @@ static int __init io_uring_init(void)
>>         struct kmem_cache_args kmem_args = {
>>                 .useroffset = offsetof(struct io_kiocb, cmd.data),
>>                 .usersize = sizeof_field(struct io_kiocb, cmd.data),
>> -               .freeptr_offset = offsetof(struct io_kiocb, work),
>> +               .freeptr_offset = offsetof(struct io_kiocb, freeptr),
>>                 .use_freeptr_offset = true,
>>         };
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
>
Geert Uytterhoeven Nov. 20, 2024, 9:07 a.m. UTC | #19
Hi Vlastimil,

On Wed, Nov 20, 2024 at 9:47 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> On 11/20/24 09:19, Geert Uytterhoeven wrote:
> > On Tue, Nov 19, 2024 at 11:30 PM Jens Axboe <axboe@kernel.dk> wrote:
> >> On 11/19/24 2:46 PM, Guenter Roeck wrote:
> >> > On 11/19/24 11:49, Jens Axboe wrote:
> >> >> On 11/19/24 12:44 PM, Jens Axboe wrote:
> >> >>>> On Tue, Nov 19, 2024 at 8:30?PM Jens Axboe <axboe@kernel.dk> wrote:
> >> >>>>> On 11/19/24 12:25 PM, Geert Uytterhoeven wrote:
> >> >>>>>> On Tue, Nov 19, 2024 at 8:10?PM Jens Axboe <axboe@kernel.dk> wrote:
> >> >>>>>>> On 11/19/24 12:02 PM, Geert Uytterhoeven wrote:
> >> >>>>>>>> On Tue, Nov 19, 2024 at 8:00?PM Jens Axboe <axboe@kernel.dk> wrote:
> >> >>>>>>>>> On 11/19/24 10:49 AM, Geert Uytterhoeven wrote:
> >> >>>>>>>>>> On Tue, Nov 19, 2024 at 5:21?PM Guenter Roeck <linux@roeck-us.net> wrote:
> >> >>>>>>>>>>> On 11/19/24 08:02, Jens Axboe wrote:
> >> >>>>>>>>>>>> On 11/19/24 8:36 AM, Guenter Roeck wrote:
> >> >>>>>>>>>>>>> On Tue, Oct 29, 2024 at 09:16:32AM -0600, Jens Axboe wrote:
> >> >>>>>>>>>>>>>> Doesn't matter right now as there's still some bytes left for it, but
> >> >>>>>>>>>>>>>> let's prepare for the io_kiocb potentially growing and add a specific
> >> >>>>>>>>>>>>>> freeptr offset for it.
> >> >>>>>>>>>>>>>>
> >> >>>>>>>>>>>>>> Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> This patch triggers:
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22
> >> >>>>>>>>>>>>> CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-mac-00971-g158f238aa69d #1
> >> >>>>>>>>>>>>> Stack from 00c63e5c:
> >> >>>>>>>>>>>>>           00c63e5c 00612c1c 00612c1c 00000300 00000001 005f3ce6 004b9044 00612c1c
> >> >>>>>>>>>>>>>           004ae21e 00000310 000000b6 005f3ce6 005f3ce6 ffffffea ffffffea 00797244
> >> >>>>>>>>>>>>>           00c63f20 000c6974 005ee588 004c9051 005f3ce6 ffffffea 000000a5 00c614a0
> >> >>>>>>>>>>>>>           004a72c2 0002cb62 000c675e 004adb58 0076f28a 005f3ce6 000000b6 00c63ef4
> >> >>>>>>>>>>>>>           00000310 00c63ef4 00000000 00000016 0076f23e 00c63f4c 00000010 00000004
> >> >>>>>>>>>>>>>           00000038 0000009a 01000000 00000000 00000000 00000000 000020e0 0076f23e
> >> >>>>>>>>>>>>> Call Trace: [<004b9044>] dump_stack+0xc/0x10
> >> >>>>>>>>>>>>>    [<004ae21e>] panic+0xc4/0x252
> >> >>>>>>>>>>>>>    [<000c6974>] __kmem_cache_create_args+0x216/0x26c
> >> >>>>>>>>>>>>>    [<004a72c2>] strcpy+0x0/0x1c
> >> >>>>>>>>>>>>>    [<0002cb62>] parse_args+0x0/0x1f2
> >> >>>>>>>>>>>>>    [<000c675e>] __kmem_cache_create_args+0x0/0x26c
> >> >>>>>>>>>>>>>    [<004adb58>] memset+0x0/0x8c
> >> >>>>>>>>>>>>>    [<0076f28a>] io_uring_init+0x4c/0xca
> >> >>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
> >> >>>>>>>>>>>>>    [<000020e0>] do_one_initcall+0x32/0x192
> >> >>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
> >> >>>>>>>>>>>>>    [<0000211c>] do_one_initcall+0x6e/0x192
> >> >>>>>>>>>>>>>    [<004a72c2>] strcpy+0x0/0x1c
> >> >>>>>>>>>>>>>    [<0002cb62>] parse_args+0x0/0x1f2
> >> >>>>>>>>>>>>>    [<000020ae>] do_one_initcall+0x0/0x192
> >> >>>>>>>>>>>>>    [<0075c4e2>] kernel_init_freeable+0x1a0/0x1a4
> >> >>>>>>>>>>>>>    [<0076f23e>] io_uring_init+0x0/0xca
> >> >>>>>>>>>>>>>    [<004b911a>] kernel_init+0x0/0xec
> >> >>>>>>>>>>>>>    [<004b912e>] kernel_init+0x14/0xec
> >> >>>>>>>>>>>>>    [<004b911a>] kernel_init+0x0/0xec
> >> >>>>>>>>>>>>>    [<0000252c>] ret_from_kernel_thread+0xc/0x14
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> when trying to boot the m68k:q800 machine in qemu.
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> An added debug message in create_cache() shows the reason:
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> #### freeptr_offset=154 object_size=182 flags=0x310 aligned=0 sizeof(freeptr_t)=4
> >> >>>>>>>>>>>>>
> >> >>>>>>>>>>>>> freeptr_offset would need to be 4-byte aligned but that is not the
> >> >>>>>>>>>>>>> case on m68k.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Why is ->work 2-byte aligned to begin with on m68k?!
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> My understanding is that m68k does not align pointers.
> >> >>>>>>>>>>
> >> >>>>>>>>>> The minimum alignment for multi-byte integral values on m68k is
> >> >>>>>>>>>> 2 bytes.
> >> >>>>>>>>>>
> >> >>>>>>>>>> See also the comment at
> >> >>>>>>>>>> https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46
> >> >>>>>>>>>
> >> >>>>>>>>> Maybe it's time we put m68k to bed? :-)
> >> >>>>>>>>>
> >> >>>>>>>>> We can add a forced alignment ->work to be 4 bytes, won't change
> >> >>>>>>>>> anything on anything remotely current. But does feel pretty hacky to
> >> >>>>>>>>> need to align based on some ancient thing.
> >> >>>>>>>>
> >> >>>>>>>> Why does freeptr_offset need to be 4-byte aligned?
> >> >>>>>>>
> >> >>>>>>> Didn't check, but it's slab/slub complaining using a 2-byte aligned
> >> >>>>>>> address for the free pointer offset. It's explicitly checking:
> >> >>>>>>>
> >> >>>>>>>          /* If a custom freelist pointer is requested make sure it's sane. */
> >> >>>>>>>          err = -EINVAL;
> >> >>>>>>>          if (args->use_freeptr_offset &&
> >> >>>>>>>              (args->freeptr_offset >= object_size ||
> >> >>>>>>>               !(flags & SLAB_TYPESAFE_BY_RCU) ||
> >> >>>>>>>               !IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
> >                                                           ^^^^^^
> >
> >> >>>>>>>                  goto out;
> >> >>>>>>
> >> >>>>>> It is not guaranteed that alignof(freeptr_t) >= sizeof(freeptr_t)
> >> >>>>>> (free_ptr is sort of a long). If freeptr_offset must be a multiple of
> >> >>>>>> 4 or 8 bytes,
> >> >>>>>> the code that assigns it must make sure that is true.
> >> >>>>>
> >> >>>>> Right, this is what the email is about...
> >> >>>>>
> >> >>>>>> I guess this is the code in fs/file_table.c:
> >> >>>>>>
> >> >>>>>>      .freeptr_offset = offsetof(struct file, f_freeptr),
> >> >>>>>>
> >> >>>>>> which references:
> >> >>>>>>
> >> >>>>>>      include/linux/fs.h:           freeptr_t               f_freeptr;
> >> >>>>>>
> >> >>>>>> I guess the simplest solution is to add an __aligned(sizeof(freeptr_t))
> >> >>>>>> (or __aligned(sizeof(long)) to the definition of freeptr_t:
> >> >>>>>>
> >> >>>>>>      include/linux/slab.h:typedef struct { unsigned long v; } freeptr_t;
> >> >>>>>
> >> >>>>> It's not, it's struct io_kiocb->work, as per the stack trace in this
> >> >>>>> email.
> >> >>>>
> >> >>>> Sorry, I was falling out of thin air into this thread...
> >> >>>>
> >> >>>> linux-next/master:io_uring/io_uring.c:          .freeptr_offset =
> >> >>>> offsetof(struct io_kiocb, work),
> >> >>>> linux-next/master:io_uring/io_uring.c:          .use_freeptr_offset = true,
> >> >>>>
> >> >>>> Apparently io_kiocb.work is of type struct io_wq_work, not freeptr_t?
> >> >>>> Isn't that a bit error-prone, as the slab core code expects a freeptr_t?
> >> >>>
> >> >>> It just needs the space, should not matter otherwise. But may as well
> >> >>> just add the union and align the freeptr so it stop complaining on m68k.
> >> >>
> >> >> Ala the below, perhaps alignment takes care of itself then?
> >> >
> >> > No, that doesn't work (I tried), at least not on its own, because the pointer
> >> > is still unaligned on m68k.
> >>
> >> Yeah we'll likely need to force it. The below should work, I pressume?
> >> Feels pretty odd to have to align it to the size of it, when that should
> >> naturally occur... Crusty legacy archs.
> >>
> >> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> >> index 593c10a02144..8ed9c6923668 100644
> >> --- a/include/linux/io_uring_types.h
> >> +++ b/include/linux/io_uring_types.h
> >> @@ -674,7 +674,11 @@ struct io_kiocb {
> >>         struct io_kiocb                 *link;
> >>         /* custom credentials, valid IFF REQ_F_CREDS is set */
> >>         const struct cred               *creds;
> >> -       struct io_wq_work               work;
> >> +
> >> +       union {
> >> +               struct io_wq_work       work;
> >> +               freeptr_t               freeptr __aligned(sizeof(freeptr_t));
> >
> > I'd rather add the __aligned() to the definition of freeptr_t, so it
> > applies to all (future) users.
> >
> > But my main question stays: why is the slab code checking
> > IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t)?
>
> I believe it's to match how SLUB normally calculates the offset if no
> explicit one is given, in calculate_sizes():
>
> s->offset = ALIGN_DOWN(s->object_size / 2, sizeof(void *));
>
> Yes there's a sizeof(void *) because freepointer used to be just that and we
> forgot to update this place when freepointer_t was introduced (by Jann in
> 44f6a42d49350) for handling CONFIG_SLAB_FREELIST_HARDENED. In
> get_freepointer() you can see how there's a cast to a pointer eventually.
>
> Does m68k have different alignment for pointer and unsigned long or both are
> 2 bytes? Or any other arch, i.e. should get_freepointer be a union with
> unsigned long and void * instead? (or it doesn't matter?)

The default alignment for int, long, and pointer is 2 on m68k.
On CRIS (no longer supported by Linux), it was 1, IIRC.
So the union won't make a difference.

> > Perhaps that was just intended to be __alignof__ instead of sizeof()?
>
> Would it do the right thing everywhere, given the explanation above?

It depends. Does anything rely on the offset being a multiple of (at
least) 4?
E.g. does anything counts in multiples of longs (hi BCPL! ;-), or are
the 2 LSB used for a special purpose? (cfr. maple_tree, which uses
bit 0 (https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46)?

Gr{oetje,eeting}s,

                        Geert
Vlastimil Babka Nov. 20, 2024, 9:37 a.m. UTC | #20
On 11/20/24 10:07, Geert Uytterhoeven wrote:
> Hi Vlastimil,
> 
>> >>
>> >> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
>> >> index 593c10a02144..8ed9c6923668 100644
>> >> --- a/include/linux/io_uring_types.h
>> >> +++ b/include/linux/io_uring_types.h
>> >> @@ -674,7 +674,11 @@ struct io_kiocb {
>> >>         struct io_kiocb                 *link;
>> >>         /* custom credentials, valid IFF REQ_F_CREDS is set */
>> >>         const struct cred               *creds;
>> >> -       struct io_wq_work               work;
>> >> +
>> >> +       union {
>> >> +               struct io_wq_work       work;
>> >> +               freeptr_t               freeptr __aligned(sizeof(freeptr_t));
>> >
>> > I'd rather add the __aligned() to the definition of freeptr_t, so it
>> > applies to all (future) users.
>> >
>> > But my main question stays: why is the slab code checking
>> > IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t)?
>>
>> I believe it's to match how SLUB normally calculates the offset if no
>> explicit one is given, in calculate_sizes():
>>
>> s->offset = ALIGN_DOWN(s->object_size / 2, sizeof(void *));
>>
>> Yes there's a sizeof(void *) because freepointer used to be just that and we
>> forgot to update this place when freepointer_t was introduced (by Jann in
>> 44f6a42d49350) for handling CONFIG_SLAB_FREELIST_HARDENED. In
>> get_freepointer() you can see how there's a cast to a pointer eventually.
>>
>> Does m68k have different alignment for pointer and unsigned long or both are
>> 2 bytes? Or any other arch, i.e. should get_freepointer be a union with
>> unsigned long and void * instead? (or it doesn't matter?)
> 
> The default alignment for int, long, and pointer is 2 on m68k.
> On CRIS (no longer supported by Linux), it was 1, IIRC.
> So the union won't make a difference.
> 
>> > Perhaps that was just intended to be __alignof__ instead of sizeof()?
>>
>> Would it do the right thing everywhere, given the explanation above?
> 
> It depends. Does anything rely on the offset being a multiple of (at
> least) 4?
> E.g. does anything counts in multiples of longs (hi BCPL! ;-), or are
> the 2 LSB used for a special purpose? (cfr. maple_tree, which uses
> bit 0 (https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46)?

AFAIK no, the goal was just to prevent misaligned accesses. Kees added the:

s->offset = ALIGN_DOWN(s->object_size / 2, sizeof(void *));

so maybe he had something else in mind. But I suspect it was just because
the code already used it elsewhere.

So we might want something like this? But that would be safer for 6.14 so
I'd suggest the io_uring specific fix meanwhile. Or maybe just add the union
with freeptr_t but without __aligned plus the part below that changes
mm/slab_common.c only, as the 6.13 io_uring fix?

diff --git a/mm/slab_common.c b/mm/slab_common.c
index 893d32059915..477fa471da18 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -230,7 +230,7 @@ static struct kmem_cache *create_cache(const char *name,
 	if (args->use_freeptr_offset &&
 	    (args->freeptr_offset >= object_size ||
 	     !(flags & SLAB_TYPESAFE_BY_RCU) ||
-	     !IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t))))
+	     !IS_ALIGNED(args->freeptr_offset, __alignof__(freeptr_t))))
 		goto out;
 
 	err = -ENOMEM;
diff --git a/mm/slub.c b/mm/slub.c
index 5b832512044e..6ad904be7700 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5287,11 +5287,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
 	unsigned int size = s->object_size;
 	unsigned int order;
 
-	/*
-	 * Round up object size to the next word boundary. We can only
-	 * place the free pointer at word boundaries and this determines
-	 * the possible location of the free pointer.
-	 */
+	/* Round up object size to the next word boundary. */
 	size = ALIGN(size, sizeof(void *));
 
 #ifdef CONFIG_SLUB_DEBUG
@@ -5325,7 +5321,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
 	if (((flags & SLAB_TYPESAFE_BY_RCU) && !args->use_freeptr_offset) ||
 	    (flags & SLAB_POISON) || s->ctor ||
 	    ((flags & SLAB_RED_ZONE) &&
-	     (s->object_size < sizeof(void *) || slub_debug_orig_size(s)))) {
+	     (s->object_size < sizeof(freeptr_t) || slub_debug_orig_size(s)))) {
 		/*
 		 * Relocate free pointer after the object if it is not
 		 * permitted to overwrite the first word of the object on
@@ -5343,7 +5339,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
 		 * longer true, the function needs to be modified.
 		 */
 		s->offset = size;
-		size += sizeof(void *);
+		size += sizeof(freeptr_t);
 	} else if ((flags & SLAB_TYPESAFE_BY_RCU) && args->use_freeptr_offset) {
 		s->offset = args->freeptr_offset;
 	} else {
@@ -5352,7 +5348,7 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s)
 		 * it away from the edges of the object to avoid small
 		 * sized over/underflows from neighboring allocations.
 		 */
-		s->offset = ALIGN_DOWN(s->object_size / 2, sizeof(void *));
+		s->offset = ALIGN_DOWN(s->object_size / 2, __alignof__(freeptr_t));
 	}
 
 #ifdef CONFIG_SLUB_DEBUG
Geert Uytterhoeven Nov. 20, 2024, 12:48 p.m. UTC | #21
Hi Vlastimil,

On Wed, Nov 20, 2024 at 10:37 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> On 11/20/24 10:07, Geert Uytterhoeven wrote:
> >> >> diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> >> >> index 593c10a02144..8ed9c6923668 100644
> >> >> --- a/include/linux/io_uring_types.h
> >> >> +++ b/include/linux/io_uring_types.h
> >> >> @@ -674,7 +674,11 @@ struct io_kiocb {
> >> >>         struct io_kiocb                 *link;
> >> >>         /* custom credentials, valid IFF REQ_F_CREDS is set */
> >> >>         const struct cred               *creds;
> >> >> -       struct io_wq_work               work;
> >> >> +
> >> >> +       union {
> >> >> +               struct io_wq_work       work;
> >> >> +               freeptr_t               freeptr __aligned(sizeof(freeptr_t));
> >> >
> >> > I'd rather add the __aligned() to the definition of freeptr_t, so it
> >> > applies to all (future) users.
> >> >
> >> > But my main question stays: why is the slab code checking
> >> > IS_ALIGNED(args->freeptr_offset, sizeof(freeptr_t)?
> >>
> >> I believe it's to match how SLUB normally calculates the offset if no
> >> explicit one is given, in calculate_sizes():
> >>
> >> s->offset = ALIGN_DOWN(s->object_size / 2, sizeof(void *));
> >>
> >> Yes there's a sizeof(void *) because freepointer used to be just that and we
> >> forgot to update this place when freepointer_t was introduced (by Jann in
> >> 44f6a42d49350) for handling CONFIG_SLAB_FREELIST_HARDENED. In
> >> get_freepointer() you can see how there's a cast to a pointer eventually.
> >>
> >> Does m68k have different alignment for pointer and unsigned long or both are
> >> 2 bytes? Or any other arch, i.e. should get_freepointer be a union with
> >> unsigned long and void * instead? (or it doesn't matter?)
> >
> > The default alignment for int, long, and pointer is 2 on m68k.
> > On CRIS (no longer supported by Linux), it was 1, IIRC.
> > So the union won't make a difference.
> >
> >> > Perhaps that was just intended to be __alignof__ instead of sizeof()?
> >>
> >> Would it do the right thing everywhere, given the explanation above?
> >
> > It depends. Does anything rely on the offset being a multiple of (at
> > least) 4?
> > E.g. does anything counts in multiples of longs (hi BCPL! ;-), or are
> > the 2 LSB used for a special purpose? (cfr. maple_tree, which uses
> > bit 0 (https://elixir.bootlin.com/linux/v6.12/source/include/linux/maple_tree.h#L46)?
>
> AFAIK no, the goal was just to prevent misaligned accesses. Kees added the:
>
> s->offset = ALIGN_DOWN(s->object_size / 2, sizeof(void *));
>
> so maybe he had something else in mind. But I suspect it was just because
> the code already used it elsewhere.
>
> So we might want something like this? But that would be safer for 6.14 so
> I'd suggest the io_uring specific fix meanwhile. Or maybe just add the union
> with freeptr_t but without __aligned plus the part below that changes
> mm/slab_common.c only, as the 6.13 io_uring fix?

As it seems to work fine with s/sizeof/__alignof/, I have submitted
a patch to just make that change
https://lore.kernel.org/80c767a5d5927c099aea5178fbf2c897b459fa90.1732106544.git.geert@linux-m68k.org

Gr{oetje,eeting}s,

                        Geert
diff mbox series

Patch

diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 2863b957e373..a09c67b38c1b 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3846,6 +3846,8 @@  static int __init io_uring_init(void)
 	struct kmem_cache_args kmem_args = {
 		.useroffset = offsetof(struct io_kiocb, cmd.data),
 		.usersize = sizeof_field(struct io_kiocb, cmd.data),
+		.freeptr_offset = offsetof(struct io_kiocb, work),
+		.use_freeptr_offset = true,
 	};
 
 #define __BUILD_BUG_VERIFY_OFFSET_SIZE(stype, eoffset, esize, ename) do { \