[v4,0/3] Hardening perf subsystem

Message ID	AS8PR02MB7237F5BFDAA793E15692B3998BFD2@AS8PR02MB7237.eurprd02.prod.outlook.com (mailing list archive)
Headers	show Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05olkn2101.outbound.protection.outlook.com [40.92.89.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 176591527A0; Sat, 1 Jun 2024 16:56:49 +0000 (UTC) From: Erick Archer <erick.archer@outlook.com> To: Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Arnaldo Carvalho de Melo <acme@kernel.org>, Namhyung Kim <namhyung@kernel.org>, Mark Rutland <mark.rutland@arm.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>, Adrian Hunter <adrian.hunter@intel.com>, "Liang, Kan" <kan.liang@linux.intel.com>, Thomas Gleixner <tglx@linutronix.de>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com>, Kees Cook <keescook@chromium.org>, "Gustavo A. R. Silva" <gustavoars@kernel.org>, Nathan Chancellor <nathan@kernel.org>, Nick Desaulniers <ndesaulniers@google.com>, Bill Wendling <morbo@google.com>, Justin Stitt <justinstitt@google.com>, Christophe JAILLET <christophe.jaillet@wanadoo.fr>, Matthew Wilcox <mawilcox@microsoft.com> Cc: Erick Archer <erick.archer@outlook.com>, x86@kernel.org, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org, llvm@lists.linux.dev Subject: [PATCH v4 0/3] Hardening perf subsystem Date: Sat, 1 Jun 2024 18:56:15 +0200 Message-ID: <AS8PR02MB7237F5BFDAA793E15692B3998BFD2@AS8PR02MB7237.eurprd02.prod.outlook.com> Content-Transfer-Encoding: 8bit Content-Type: text/plain Precedence: bulk MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1
Series	Hardening perf subsystem \| expand [v4,0/3] Hardening perf subsystem [v4,1/3] perf/x86/amd/uncore: Add flex array to struct amd_uncore_ctx [v4,2/3] perf/x86/intel/uncore: Prefer struct_size over open coded arithmetic [v4,3/3] perf/ring_buffer: Prefer struct_size over open coded arithmetic

Erick Archer June 1, 2024, 4:56 p.m. UTC

Hi everyone,

This is an effort to get rid of all multiplications from allocation
functions in order to prevent integer overflows [1][2].

In the first patch, the "struct amd_uncore_ctx" can be refactored to
use a flex array for the "events" member. This way, the allocation/
freeing of the memory can be simplified. Then, the struct_size()
helper can be used to do the arithmetic calculation for the memory
to be allocated.

In the second patch, as the "struct intel_uncore_box" ends in a
flexible array, the preferred way in the kernel is to use the
struct_size() helper to do the arithmetic instead of the calculation
"size + count * size" in the kzalloc_node() function.

In the third patch, as the "struct perf_buffer" also ends in a
flexible array, the preferred way in the kernel is to use the
struct_size() helper to do the arithmetic instead of the calculation
"size + count * size" in the kzalloc_node() functions. At the same
time, prepare for the coming implementation by GCC and Clang of the
__counted_by attribute.

This time, I have decided to send these three patches in the same serie
since all of them has been rejected by the maintainers. I have used
the v4 tag since it is the latest iteration in one of the patches.

The reason these patches were rejected is that Peter Zijlstra detest
the struct_size() helper [3][4]. However, Kees Cook and I consider that
the use of this helper improves readability. But we can all say that it
is a matter of preference.

Anyway, leaving aside personal preferences, these patches has the
following pros:

- Code robustness improvement (__counted_by coverage).
- Code robustness improvement (use of struct_size() to do calculations
  on memory allocator functions).
- Fewer lines of code.
- Follow the recommendations made in "Deprecated Interfaces, Language
  Features, Attributes, and Conventions" [5] as the preferred method
  of doing things in the kernel.
- Widely used in the kernel.
- Widely accepted in the kernel.

There are also patches in this subsystem that use the struct_size()
helper:

6566f907bf31 ("Convert intel uncore to struct_size") by Matthew Wilcox
dfbc411e0a5e ("perf/x86/rapl: Prefer struct_size() over open coded arithmetic") by me

Therefore, I would like these patches to be applied this time.

Best regards,
Erick

Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1]
Link: https://github.com/KSPP/linux/issues/160 [2]
Link: https://lore.kernel.org/linux-hardening/20240430091833.GD40213@noisy.programming.kicks-ass.net [3]
Link: https://lore.kernel.org/linux-hardening/20240430091504.GC40213@noisy.programming.kicks-ass.net [4]
Link: https://docs.kernel.org/process/deprecated.html [5]
---
Changes in v4:

- Add the "Reviewed-by:" tag to the three patches.
- Rebase against linux next (tag next-20240531).

Previous versions:

perf/x86/amd/uncore: Add flex array to struct amd_uncore_ctx
  v1 -> https://lore.kernel.org/linux-hardening/AS8PR02MB7237E4848B44A5226BD3551C8BE02@AS8PR02MB7237.eurprd02.prod.outlook.com/

perf/x86/intel/uncore: Prefer struct_size over open coded arithmetic
  v1 -> https://lore.kernel.org/linux-hardening/AS8PR02MB7237F4D39BF6AA0FF40E91638B392@AS8PR02MB7237.eurprd02.prod.outlook.com/

perf/ring_buffer: Prefer struct_size over open coded arithmetic
  v3 -> https://lore.kernel.org/linux-hardening/AS8PR02MB72379B4807F3951A1B926BA58BE02@AS8PR02MB7237.eurprd02.prod.outlook.com/
  v2 -> https://lore.kernel.org/linux-hardening/AS8PR02MB7237569E4FBE0B26F62FDFDB8B1D2@AS8PR02MB7237.eurprd02.prod.outlook.com/
  v1 -> https://lore.kernel.org/linux-hardening/AS8PR02MB72372AB065EA8340D960CCC48B1B2@AS8PR02MB7237.eurprd02.prod.outlook.com/

  Changes in v3:
  - Refactor the logic, compared to the previous version, of the second
    rb_alloc() function to gain __counted_by() coverage (Kees Cook and
    Christophe JAILLET).

  Changes in v2:
  - Annotate "struct perf_buffer" with __counted_by() attribute (Kees Cook).
  - Refactor the logic to gain __counted_by() coverage (Kees Cook).
---
Erick Archer (3):
  perf/x86/amd/uncore: Add flex array to struct amd_uncore_ctx
  perf/x86/intel/uncore: Prefer struct_size over open coded arithmetic
  perf/ring_buffer: Prefer struct_size over open coded arithmetic

 arch/x86/events/amd/uncore.c   | 18 +++++-------------
 arch/x86/events/intel/uncore.c |  7 +++----
 kernel/events/internal.h       |  2 +-
 kernel/events/ring_buffer.c    | 15 ++++-----------
 4 files changed, 13 insertions(+), 29 deletions(-)

Erick Archer June 8, 2024, 8:50 a.m. UTC | #1

Hi Andrew,

On Sat, Jun 01, 2024 at 06:56:15PM +0200, Erick Archer wrote:
> Hi everyone,
> 
> This is an effort to get rid of all multiplications from allocation
> functions in order to prevent integer overflows [1][2].
> 
> In the first patch, the "struct amd_uncore_ctx" can be refactored to
> use a flex array for the "events" member. This way, the allocation/
> freeing of the memory can be simplified. Then, the struct_size()
> helper can be used to do the arithmetic calculation for the memory
> to be allocated.
> 
> In the second patch, as the "struct intel_uncore_box" ends in a
> flexible array, the preferred way in the kernel is to use the
> struct_size() helper to do the arithmetic instead of the calculation
> "size + count * size" in the kzalloc_node() function.
> 
> In the third patch, as the "struct perf_buffer" also ends in a
> flexible array, the preferred way in the kernel is to use the
> struct_size() helper to do the arithmetic instead of the calculation
> "size + count * size" in the kzalloc_node() functions. At the same
> time, prepare for the coming implementation by GCC and Clang of the
> __counted_by attribute.
> 
> This time, I have decided to send these three patches in the same serie
> since all of them has been rejected by the maintainers. I have used
> the v4 tag since it is the latest iteration in one of the patches.
> 
> The reason these patches were rejected is that Peter Zijlstra detest
> the struct_size() helper [3][4]. However, Kees Cook and I consider that
> the use of this helper improves readability. But we can all say that it
> is a matter of preference.
> 
> Anyway, leaving aside personal preferences, these patches has the
> following pros:
> 
> - Code robustness improvement (__counted_by coverage).
> - Code robustness improvement (use of struct_size() to do calculations
>   on memory allocator functions).
> - Fewer lines of code.
> - Follow the recommendations made in "Deprecated Interfaces, Language
>   Features, Attributes, and Conventions" [5] as the preferred method
>   of doing things in the kernel.
> - Widely used in the kernel.
> - Widely accepted in the kernel.
> 
> There are also patches in this subsystem that use the struct_size()
> helper:
> 
> 6566f907bf31 ("Convert intel uncore to struct_size") by Matthew Wilcox
> dfbc411e0a5e ("perf/x86/rapl: Prefer struct_size() over open coded arithmetic") by me
> 
> Therefore, I would like these patches to be applied this time.

This is my last attemp to get these patches applied. I have decided to
send this mail to try to unjam this situation. I have folowed all the
reviewers comments and have no response from the maintainers other than
"I detest the struct_size() helper".

Therefore, I would like to know your opinion and that of other people
about these patches. If the final consensus is that the code has no real
benefit, I will stop insisting on it ;)

Regards,
Erick
> 
> Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments [1]
> Link: https://github.com/KSPP/linux/issues/160 [2]
> Link: https://lore.kernel.org/linux-hardening/20240430091833.GD40213@noisy.programming.kicks-ass.net [3]
> Link: https://lore.kernel.org/linux-hardening/20240430091504.GC40213@noisy.programming.kicks-ass.net [4]
> Link: https://docs.kernel.org/process/deprecated.html [5]
> ---
> Changes in v4:
> 
> - Add the "Reviewed-by:" tag to the three patches.
> - Rebase against linux next (tag next-20240531).
> 
> Previous versions:
> 
> perf/x86/amd/uncore: Add flex array to struct amd_uncore_ctx
>   v1 -> https://lore.kernel.org/linux-hardening/AS8PR02MB7237E4848B44A5226BD3551C8BE02@AS8PR02MB7237.eurprd02.prod.outlook.com/
> 
> perf/x86/intel/uncore: Prefer struct_size over open coded arithmetic
>   v1 -> https://lore.kernel.org/linux-hardening/AS8PR02MB7237F4D39BF6AA0FF40E91638B392@AS8PR02MB7237.eurprd02.prod.outlook.com/
> 
> perf/ring_buffer: Prefer struct_size over open coded arithmetic
>   v3 -> https://lore.kernel.org/linux-hardening/AS8PR02MB72379B4807F3951A1B926BA58BE02@AS8PR02MB7237.eurprd02.prod.outlook.com/
>   v2 -> https://lore.kernel.org/linux-hardening/AS8PR02MB7237569E4FBE0B26F62FDFDB8B1D2@AS8PR02MB7237.eurprd02.prod.outlook.com/
>   v1 -> https://lore.kernel.org/linux-hardening/AS8PR02MB72372AB065EA8340D960CCC48B1B2@AS8PR02MB7237.eurprd02.prod.outlook.com/
> 
>   Changes in v3:
>   - Refactor the logic, compared to the previous version, of the second
>     rb_alloc() function to gain __counted_by() coverage (Kees Cook and
>     Christophe JAILLET).
> 
>   Changes in v2:
>   - Annotate "struct perf_buffer" with __counted_by() attribute (Kees Cook).
>   - Refactor the logic to gain __counted_by() coverage (Kees Cook).
> ---
> Erick Archer (3):
>   perf/x86/amd/uncore: Add flex array to struct amd_uncore_ctx
>   perf/x86/intel/uncore: Prefer struct_size over open coded arithmetic
>   perf/ring_buffer: Prefer struct_size over open coded arithmetic
> 
>  arch/x86/events/amd/uncore.c   | 18 +++++-------------
>  arch/x86/events/intel/uncore.c |  7 +++----
>  kernel/events/internal.h       |  2 +-
>  kernel/events/ring_buffer.c    | 15 ++++-----------
>  4 files changed, 13 insertions(+), 29 deletions(-)
> 
> -- 
> 2.25.1
>

Peter Zijlstra June 10, 2024, 10:06 a.m. UTC | #2

On Sat, Jun 08, 2024 at 10:50:44AM +0200, Erick Archer wrote:
> Hi Andrew,
> 
> On Sat, Jun 01, 2024 at 06:56:15PM +0200, Erick Archer wrote:
> > Hi everyone,
> > 
> > This is an effort to get rid of all multiplications from allocation
> > functions in order to prevent integer overflows [1][2].
> > 
> > In the first patch, the "struct amd_uncore_ctx" can be refactored to
> > use a flex array for the "events" member. This way, the allocation/
> > freeing of the memory can be simplified. Then, the struct_size()
> > helper can be used to do the arithmetic calculation for the memory
> > to be allocated.
> > 
> > In the second patch, as the "struct intel_uncore_box" ends in a
> > flexible array, the preferred way in the kernel is to use the
> > struct_size() helper to do the arithmetic instead of the calculation
> > "size + count * size" in the kzalloc_node() function.
> > 
> > In the third patch, as the "struct perf_buffer" also ends in a
> > flexible array, the preferred way in the kernel is to use the
> > struct_size() helper to do the arithmetic instead of the calculation
> > "size + count * size" in the kzalloc_node() functions. At the same
> > time, prepare for the coming implementation by GCC and Clang of the
> > __counted_by attribute.
> > 
> > This time, I have decided to send these three patches in the same serie
> > since all of them has been rejected by the maintainers. I have used
> > the v4 tag since it is the latest iteration in one of the patches.
> > 
> > The reason these patches were rejected is that Peter Zijlstra detest
> > the struct_size() helper [3][4]. However, Kees Cook and I consider that
> > the use of this helper improves readability. But we can all say that it
> > is a matter of preference.
> > 
> > Anyway, leaving aside personal preferences, these patches has the
> > following pros:
> > 
> > - Code robustness improvement (__counted_by coverage).
> > - Code robustness improvement (use of struct_size() to do calculations
> >   on memory allocator functions).
> > - Fewer lines of code.
> > - Follow the recommendations made in "Deprecated Interfaces, Language
> >   Features, Attributes, and Conventions" [5] as the preferred method
> >   of doing things in the kernel.
> > - Widely used in the kernel.
> > - Widely accepted in the kernel.
> > 
> > There are also patches in this subsystem that use the struct_size()
> > helper:
> > 
> > 6566f907bf31 ("Convert intel uncore to struct_size") by Matthew Wilcox
> > dfbc411e0a5e ("perf/x86/rapl: Prefer struct_size() over open coded arithmetic") by me
> > 
> > Therefore, I would like these patches to be applied this time.
> 
> This is my last attemp to get these patches applied. I have decided to
> send this mail to try to unjam this situation. I have folowed all the
> reviewers comments and have no response from the maintainers other than
> "I detest the struct_size() helper".
> 
> Therefore, I would like to know your opinion and that of other people
> about these patches. If the final consensus is that the code has no real
> benefit, I will stop insisting on it ;)

Seriously, I've got plenty patches to look at that actually do
something. This falls well within the 'random changes for changes sake'
and goes waaaaay down the priority list.

If you're addressing an actual issue, state so. Otherwise, go play
somewhere else.

Kees Cook June 10, 2024, 5:28 p.m. UTC | #3

On Sat, Jun 01, 2024 at 06:56:15PM +0200, Erick Archer wrote:
> Hi everyone,
> 
> This is an effort to get rid of all multiplications from allocation
> functions in order to prevent integer overflows [1][2].

I didn't actually see these 3 patches in this thread nor via lore.

> In the first patch, the "struct amd_uncore_ctx" can be refactored to
> use a flex array for the "events" member. This way, the allocation/
> freeing of the memory can be simplified. Then, the struct_size()
> helper can be used to do the arithmetic calculation for the memory
> to be allocated.

I like this patch because it reduces the allocation from 2 to 1. This
isn't what Peter might see as "churn": this is an improvement in resource
utilization.

I think it's here:
https://lore.kernel.org/linux-hardening/AS8PR02MB7237E4848B44A5226BD3551C8BE02@AS8PR02MB7237.eurprd02.prod.outlook.com/

> In the second patch, as the "struct intel_uncore_box" ends in a
> flexible array, the preferred way in the kernel is to use the
> struct_size() helper to do the arithmetic instead of the calculation
> "size + count * size" in the kzalloc_node() function.

This is
https://lore.kernel.org/linux-hardening/AS8PR02MB7237F4D39BF6AA0FF40E91638B392@AS8PR02MB7237.eurprd02.prod.outlook.com/

I prefer this style, as it makes things unambiguous ("this will never
wrap around") without having to check the associated types and doesn't make
the resulting binary code different in the "can never overflow" case.

In this particular case:

int size = sizeof(*box) + numshared * sizeof(struct intel_uncore_extra_reg);

"int numshared" comes from struct intel_uncore_type::num_shared_regs,
which is:

        unsigned num_shared_regs:8;

And the struct sizes are:

$ pahole -C intel_uncore_box gcc-boot/vmlinux | grep size:
        /* size: 488, cachelines: 8, members: 19 */
$ pahole -C intel_uncore_extra_reg gcc-boot/vmlinux | grep size:
        /* size: 96, cachelines: 2, members: 5 */

So we have:

s32 size = 488 + u8 * 96

Max size here is 24968 so it can never overflow an s32, so I can see
why Peter views this as "churn".

I still think the patch is a coding style improvement, but okay.

> In the third patch, as the "struct perf_buffer" also ends in a
> flexible array, the preferred way in the kernel is to use the
> struct_size() helper to do the arithmetic instead of the calculation
> "size + count * size" in the kzalloc_node() functions. At the same
> time, prepare for the coming implementation by GCC and Clang of the
> __counted_by attribute.

This is
https://lore.kernel.org/linux-hardening/AS8PR02MB72379B4807F3951A1B926BA58BE02@AS8PR02MB7237.eurprd02.prod.outlook.com/

This provides __counted_by coverage, and I think this is important to
gain in ever place we can. Given that this is part of a ring buffer
implementation that is arbitrarily sized, this is exactly the kind of
place I'd like to see __counted_by used. This is a runtime robustness
improvement, so I don't see this a "churn" at all.

Peter, for patches 1 and 3, if you'd prefer not to carry them, I could
put them in the hardening tree to keep them out of your way. It seems
clear you don't want patch 2 at all.

-Kees

Peter Zijlstra June 10, 2024, 8:05 p.m. UTC | #4

On Mon, Jun 10, 2024 at 10:28:52AM -0700, Kees Cook wrote:
> On Sat, Jun 01, 2024 at 06:56:15PM +0200, Erick Archer wrote:
> > Hi everyone,
> > 
> > This is an effort to get rid of all multiplications from allocation
> > functions in order to prevent integer overflows [1][2].
> 
> I didn't actually see these 3 patches in this thread nor via lore.

He managed to break threading between 0/n and the rest.

> > In the first patch, the "struct amd_uncore_ctx" can be refactored to
> > use a flex array for the "events" member. This way, the allocation/
> > freeing of the memory can be simplified. Then, the struct_size()
> > helper can be used to do the arithmetic calculation for the memory
> > to be allocated.
> 
> I like this patch because it reduces the allocation from 2 to 1. This
> isn't what Peter might see as "churn": this is an improvement in resource
> utilization.

But then he went and used that struct_size() abomination :/

> I prefer this style, as it makes things unambiguous ("this will never
> wrap around") without having to check the associated types and doesn't make
> the resulting binary code different in the "can never overflow" case.
> 
> In this particular case:
> 
> int size = sizeof(*box) + numshared * sizeof(struct intel_uncore_extra_reg);
> 
> "int numshared" comes from struct intel_uncore_type::num_shared_regs,
> which is:
> 
>         unsigned num_shared_regs:8;
> 
> And the struct sizes are:
> 
> $ pahole -C intel_uncore_box gcc-boot/vmlinux | grep size:
>         /* size: 488, cachelines: 8, members: 19 */
> $ pahole -C intel_uncore_extra_reg gcc-boot/vmlinux | grep size:
>         /* size: 96, cachelines: 2, members: 5 */
> 
> So we have:
> 
> s32 size = 488 + u8 * 96
> 
> Max size here is 24968 so it can never overflow an s32, so I can see
> why Peter views this as "churn".
> 
> I still think the patch is a coding style improvement, but okay.

I really detest this thing because it makes what was trivially readable
into something opaque. Get me that type qualifier that traps on overflow
and write plain C. All this __builtin_overflow garbage is just that,
unreadable nonsense.

> This provides __counted_by coverage, and I think this is important to
> gain in ever place we can. Given that this is part of a ring buffer
> implementation that is arbitrarily sized, this is exactly the kind of
> place I'd like to see __counted_by used. This is a runtime robustness
> improvement, so I don't see this a "churn" at all.

Again, mixed in with that other crap. Anyway, remind me wth this
__counted_by thing actually does?

> Peter, for patches 1 and 3, if you'd prefer not to carry them, I could
> put them in the hardening tree to keep them out of your way. It seems
> clear you don't want patch 2 at all.

I prefer to not have struct_size() anywhere at all. Please just write
readable code.

Again, if you really care about that overflow muck, get a useable C type
qualifier or something so we can write readable code.

Kees Cook June 10, 2024, 9:46 p.m. UTC | #5

On Mon, Jun 10, 2024 at 10:05:44PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 10, 2024 at 10:28:52AM -0700, Kees Cook wrote:
> > On Sat, Jun 01, 2024 at 06:56:15PM +0200, Erick Archer wrote:
> > > Hi everyone,
> > > 
> > > This is an effort to get rid of all multiplications from allocation
> > > functions in order to prevent integer overflows [1][2].
> > 
> > I didn't actually see these 3 patches in this thread nor via lore.
> 
> He managed to break threading between 0/n and the rest.
> 
> > > In the first patch, the "struct amd_uncore_ctx" can be refactored to
> > > use a flex array for the "events" member. This way, the allocation/
> > > freeing of the memory can be simplified. Then, the struct_size()
> > > helper can be used to do the arithmetic calculation for the memory
> > > to be allocated.
> > 
> > I like this patch because it reduces the allocation from 2 to 1. This
> > isn't what Peter might see as "churn": this is an improvement in resource
> > utilization.
> 
> But then he went and used that struct_size() abomination :/
> 
> > I prefer this style, as it makes things unambiguous ("this will never
> > wrap around") without having to check the associated types and doesn't make
> > the resulting binary code different in the "can never overflow" case.
> > 
> > In this particular case:
> > 
> > int size = sizeof(*box) + numshared * sizeof(struct intel_uncore_extra_reg);
> > 
> > "int numshared" comes from struct intel_uncore_type::num_shared_regs,
> > which is:
> > 
> >         unsigned num_shared_regs:8;
> > 
> > And the struct sizes are:
> > 
> > $ pahole -C intel_uncore_box gcc-boot/vmlinux | grep size:
> >         /* size: 488, cachelines: 8, members: 19 */
> > $ pahole -C intel_uncore_extra_reg gcc-boot/vmlinux | grep size:
> >         /* size: 96, cachelines: 2, members: 5 */
> > 
> > So we have:
> > 
> > s32 size = 488 + u8 * 96
> > 
> > Max size here is 24968 so it can never overflow an s32, so I can see
> > why Peter views this as "churn".
> > 
> > I still think the patch is a coding style improvement, but okay.
> 
> I really detest this thing because it makes what was trivially readable
> into something opaque. Get me that type qualifier that traps on overflow
> and write plain C. All this __builtin_overflow garbage is just that,
> unreadable nonsense.

It's more readable than container_of(), IMO. "give me the struct size
for variable VAR, which has a flexible array MEMBER, when we have COUNT
many of them": struct_size(VAR, MEMBER, COUNT). It's more readable, more
robust, and provides saturation in the face of potential wrap-around.

> > This provides __counted_by coverage, and I think this is important to
> > gain in ever place we can. Given that this is part of a ring buffer
> > implementation that is arbitrarily sized, this is exactly the kind of
> > place I'd like to see __counted_by used. This is a runtime robustness
> > improvement, so I don't see this a "churn" at all.
> 
> Again, mixed in with that other crap. Anyway, remind me wth this
> __counted_by thing actually does?

It provides annotation for the compiler to perform run-time bounds
checking on dynamically sized arrays. i.e. CONFIG_FORTIFY_SOURCE and
CONFIG_UBSAN_BOUNDS can actually reason about annotated flexible arrays
instead of just saying "oh no a flexible array, I give up".

> > Peter, for patches 1 and 3, if you'd prefer not to carry them, I could
> > put them in the hardening tree to keep them out of your way. It seems
> > clear you don't want patch 2 at all.
> 
> I prefer to not have struct_size() anywhere at all. Please just write
> readable code.

That ship has sailed, and it has been keeping things at bay for a while
now. As we make progress on making the compiler able to do this more
naturally, we can work on replacing struct_size(), but it's in use
globally and it's useful both for catching runtime mistakes and for
catching compile-time mistakes (the flexible array has to match the
variable's struct).

-Kees

Peter Zijlstra June 11, 2024, 7:55 a.m. UTC | #6

On Mon, Jun 10, 2024 at 02:46:09PM -0700, Kees Cook wrote:

> > I really detest this thing because it makes what was trivially readable
> > into something opaque. Get me that type qualifier that traps on overflow
> > and write plain C. All this __builtin_overflow garbage is just that,
> > unreadable nonsense.
> 
> It's more readable than container_of(), 

Yeah, no. container_of() is absolutely trivial and very readable.
container_of_const() a lot less so.

(one static_assert() removed)

#define container_of(ptr, type, member) ({                              \
        void *__mptr = (void *)(ptr);                                   \
        ((type *)(__mptr - offsetof(type, member))); })

Which is very clear indeed in what it does. Compare with:

#define struct_size(p, member, count)                                   \
        __builtin_choose_expr(__is_constexpr(count),                    \
                sizeof(*(p)) + flex_array_size(p, member, count),       \
                size_add(sizeof(*(p)), flex_array_size(p, member, count)))

And I still have no idea :-(

> IMO. "give me the struct size
> for variable VAR, which has a flexible array MEMBER, when we have COUNT
> many of them": struct_size(VAR, MEMBER, COUNT). It's more readable, more
> robust, and provides saturation in the face of potential wrap-around.

I'm sure you know what it does. Thing is, I don't care because I can
trivially write it myself and not have to care and I'll have forgotten
all about it the moment I sent this email.

It just doesn't make sense to wrap something as utterly trivial as:

	size = sizeof(*p) + num*sizeof(p->foo);

We're going to have to agree to disagree on this.

Note how I naturally get the order wrong?

[[ There is the whole FMA angle to this, that is, fundamentally this is a
multiply-accumulate, but the problem there is the same that I noted,
there is no fixed order, a+b*c and a*b+c are both very common
definitions -- although I lean towards the latter being the correct one,
given the order in the naming. I suppose this is a long winded way of
saying that:

#define struct_size(p, member, num) \
	mult_add_no_overflow(num, sizeof(p->member), sizeof(*p))

would be *FAR* more readable. And then I still think struct_size() is
less readable than its expansion. ]]

> > > This provides __counted_by coverage, and I think this is important to
> > > gain in ever place we can. Given that this is part of a ring buffer
> > > implementation that is arbitrarily sized, this is exactly the kind of
> > > place I'd like to see __counted_by used. This is a runtime robustness
> > > improvement, so I don't see this a "churn" at all.
> > 
> > Again, mixed in with that other crap. Anyway, remind me wth this
> > __counted_by thing actually does?
> 
> It provides annotation for the compiler to perform run-time bounds
> checking on dynamically sized arrays. i.e. CONFIG_FORTIFY_SOURCE and
> CONFIG_UBSAN_BOUNDS can actually reason about annotated flexible arrays
> instead of just saying "oh no a flexible array, I give up".

Some day I'll have to look at this FORTIFY_SOURCE and see what it
actually does I suppose :/

> > > Peter, for patches 1 and 3, if you'd prefer not to carry them, I could
> > > put them in the hardening tree to keep them out of your way. It seems
> > > clear you don't want patch 2 at all.
> > 
> > I prefer to not have struct_size() anywhere at all. Please just write
> > readable code.
> 
> That ship has sailed, and it has been keeping things at bay for a while
> now. As we make progress on making the compiler able to do this more
> naturally, we can work on replacing struct_size(), but it's in use
> globally and it's useful both for catching runtime mistakes and for
> catching compile-time mistakes (the flexible array has to match the
> variable's struct).

I coulnd't quickly find a single instance in the code I care about. So
nothing is sailing afaict.

Kees Cook June 12, 2024, 7:01 p.m. UTC | #7

On Tue, Jun 11, 2024 at 09:55:42AM +0200, Peter Zijlstra wrote:
> On Mon, Jun 10, 2024 at 02:46:09PM -0700, Kees Cook wrote:
> 
> > > I really detest this thing because it makes what was trivially readable
> > > into something opaque. Get me that type qualifier that traps on overflow
> > > and write plain C. All this __builtin_overflow garbage is just that,
> > > unreadable nonsense.
> > 
> > It's more readable than container_of(), 
> 
> Yeah, no. container_of() is absolutely trivial and very readable.
> container_of_const() a lot less so.

I mean, we have complex macros in the kernel. This isn't uncommon. Look
at cleanup.h. ;)

But that's why we write kern-doc:

 * struct_size() - Calculate size of structure with trailing flexible
 * array.
 * @p: Pointer to the structure.
 * @member: Name of the array member.
 * @count: Number of elements in the array.

> #define struct_size(p, member, num) \
> 	mult_add_no_overflow(num, sizeof(p->member), sizeof(*p))
> 
> would be *FAR* more readable. And then I still think struct_size() is
> less readable than its expansion. ]]

I'm happy to take patches. And for this bikeshed, this would be better
named under the size_*() helpers which are trying to keep size_t
calculations from overflowing (by saturating). i.e.:

	size_add_mult(sizeof(*p), sizeof(*p->member), num)

Generalized overflow handing (check_add/sub/mul_overflow()) helpers
already exist and requires a destination variable to determine type
sizes. It is not safe to allow C semantics to perform
promotion/truncation, so we have to be explicit.

> Some day I'll have to look at this FORTIFY_SOURCE and see what it
> actually does I suppose :/

LOL. It's basically doing compile-time (__builtin_object_size) and
run-time (__builtin_dynamic_object_size) bounds checking on destination
(and source) object sizes, mainly driven by the mentioned builtins:
https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html

> I coulnd't quickly find a single instance in the code I care about. So
> nothing is sailing afaict.

I have to care about all of Linux's code. :P We generally don't have to
defend the kernel from perf, so the hardening changes tend to end up in
core APIs along with compiler improvements.

Anyway! What about the patch that takes the 2 allocations down to 1?
That seems like an obvious improvement.

-Kees

Peter Zijlstra June 12, 2024, 10:08 p.m. UTC | #8

On Wed, Jun 12, 2024 at 12:01:19PM -0700, Kees Cook wrote:
> On Tue, Jun 11, 2024 at 09:55:42AM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 10, 2024 at 02:46:09PM -0700, Kees Cook wrote:
> > 
> > > > I really detest this thing because it makes what was trivially readable
> > > > into something opaque. Get me that type qualifier that traps on overflow
> > > > and write plain C. All this __builtin_overflow garbage is just that,
> > > > unreadable nonsense.
> > > 
> > > It's more readable than container_of(), 
> > 
> > Yeah, no. container_of() is absolutely trivial and very readable.
> > container_of_const() a lot less so.
> 
> I mean, we have complex macros in the kernel. This isn't uncommon. Look
> at cleanup.h. ;)

Yeah, but in those cases the macro actually saves a lot of typing. A
mult and add is something you shouldn't be needing a macro for.

> But that's why we write kern-doc:

Right, but reading comments is asking for trouble. That is, I really
don't even see comments -- I have a built-in pre-processor that filters
them out. I've been burned too many times by misleading or flat out
wrong comments.

> > #define struct_size(p, member, num) \
> > 	mult_add_no_overflow(num, sizeof(p->member), sizeof(*p))
> > 
> > would be *FAR* more readable. And then I still think struct_size() is
> > less readable than its expansion. ]]
> 
> I'm happy to take patches. And for this bikeshed, this would be better
> named under the size_*() helpers which are trying to keep size_t
> calculations from overflowing (by saturating). i.e.:
> 
> 	size_add_mult(sizeof(*p), sizeof(*p->member), num)

Fine I suppose, but what if we want something not size_t? Are we waiting
for the type system extension?

Anyway, I'll take the patch with the above. That at least is readable.

The saturating thing is relying in the allocators never granting INT_MAX
(or whatever size_t actually is) bytes?

> Generalized overflow handing (check_add/sub/mul_overflow()) helpers
> already exist and requires a destination variable to determine type
> sizes. It is not safe to allow C semantics to perform
> promotion/truncation, so we have to be explicit.

Yeah, those are a sodding trainwreck, much like the
__builtin_*_overflow() garbage. Can't we simply have them trap on
overflow and do away with that most terrible calling convention?

If we want to be really fancy we can have a #UD instruction that encodes
a destination register to clear/saturate/whatever.

> > Some day I'll have to look at this FORTIFY_SOURCE and see what it
> > actually does I suppose :/
> 
> LOL. It's basically doing compile-time (__builtin_object_size) and
> run-time (__builtin_dynamic_object_size) bounds checking on destination
> (and source) object sizes, mainly driven by the mentioned builtins:
> https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html

Right, I got that far. I also read most of:

  https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854

But none of that is showing me generated asm for the various cases. As
such, I don't consider myself informed enough.

> Anyway! What about the patch that takes the 2 allocations down to 1?
> That seems like an obvious improvement.

Separate it from the struct_size() nonsense and Cc the author of that
code (Sandipan IIRC) and I might just apply it.

Kees Cook June 12, 2024, 11:23 p.m. UTC | #9

On Thu, Jun 13, 2024 at 12:08:21AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 12, 2024 at 12:01:19PM -0700, Kees Cook wrote:
> > I'm happy to take patches. And for this bikeshed, this would be better
> > named under the size_*() helpers which are trying to keep size_t
> > calculations from overflowing (by saturating). i.e.:
> > 
> > 	size_add_mult(sizeof(*p), sizeof(*p->member), num)
> 
> Fine I suppose, but what if we want something not size_t? Are we waiting
> for the type system extension?

Because of C's implicit promotion/truncation, we can't do anything
sanely with return values of arbitrary type size; we have to capture the
lvalue type somehow so the checking can happen without C doing silent
garbage.

> The saturating thing is relying in the allocators never granting INT_MAX
> (or whatever size_t actually is) bytes?

The max of size_t is ULONG_MAX, but yes, most of the allocators will
refuse >INT_MAX, but I think vmalloc() is higher, but certainly not
SIZE_MAX, which is the entire virtual memory space. ;)

The saturating thing is two-fold: that we never wrap around SIZE_MAX,
and that the allocator will refuse a SIZE_MAX allocation.

> > LOL. It's basically doing compile-time (__builtin_object_size) and
> > run-time (__builtin_dynamic_object_size) bounds checking on destination
> > (and source) object sizes, mainly driven by the mentioned builtins:
> > https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html
> 
> Right, I got that far. I also read most of:
> 
>   https://discourse.llvm.org/t/rfc-enforcing-bounds-safety-in-c-fbounds-safety/70854

Oh wow, that's serious extra credit. :) It'll also probably be a while
before most of that stuff is even landed in Clang, much less implemented
in GCC. What we _do_ have is the "counted_by" attribute. This was added
to Clang a little while ago and just landed last week in GCC for GCC 15.

> But none of that is showing me generated asm for the various cases. As
> such, I don't consider myself informed enough.

Gotcha. For the compile-time stuff it's all just looking at
known-at-compile-time sizes. So for something like this, we get a
__compiletime_warning() emitted:

	const char src[] = "Hello there";
	char dst[10];

	strscpy(dst, src); /* Compiler yells since src is bigger than dst. */

For run-time checks it's basically just using the regular WARN()
infrastructure with __builtin_dynamic_object_size(). Here's a simplified
userspace example with assert():

https://godbolt.org/z/zMrKnMxn5

The kernel's FORTIFY_SOURCE is much more complex in how it does the
checking, how it does the reporting (for helping people figure out what's
gone weird), etc.

> > Anyway! What about the patch that takes the 2 allocations down to 1?
> > That seems like an obvious improvement.
> 
> Separate it from the struct_size() nonsense and Cc the author of that
> code (Sandipan IIRC) and I might just apply it.

Okay, thanks!

-Kees

Peter Zijlstra June 14, 2024, 10:17 a.m. UTC | #10

On Wed, Jun 12, 2024 at 04:23:31PM -0700, Kees Cook wrote:
> On Thu, Jun 13, 2024 at 12:08:21AM +0200, Peter Zijlstra wrote:
> > On Wed, Jun 12, 2024 at 12:01:19PM -0700, Kees Cook wrote:
> > > I'm happy to take patches. And for this bikeshed, this would be better
> > > named under the size_*() helpers which are trying to keep size_t
> > > calculations from overflowing (by saturating). i.e.:
> > > 
> > > 	size_add_mult(sizeof(*p), sizeof(*p->member), num)
> > 
> > Fine I suppose, but what if we want something not size_t? Are we waiting
> > for the type system extension?
> 
> Because of C's implicit promotion/truncation, we can't do anything
> sanely with return values of arbitrary type size; we have to capture the
> lvalue type somehow so the checking can happen without C doing silent
> garbage.

So sizeof() returns the native (built-in) size_t, right? If that type
the nooverflow qualifier on, then:

	sizeof(*p) + num*sizeof(p->foo[0])

should all get the nooverflow semantics right? Because size_t is
effectively 'nooverflow unsigned long' the multiplication should promote
'num' to some 'long'.

Now, I've re-read the rules and I don't see qualifiers mentioned, so
can't we state that the overflow/nooverflow qualifiers are to be
preserved on (implicit) promotion and when nooverflow and overflow are
combined the 'safe' nooverflow takes precedence?

I mean, when we're adding qualifiers we can make up rules about them
too, right?

If 'people' don't want to adorn the built-in size_t, we can always do
something like:

#define sizeof(x) ((nooverflow unsigned long)(sizeof(x)))

and 'fix' it ourselves.

> > But none of that is showing me generated asm for the various cases. As
> > such, I don't consider myself informed enough.
> 
> Gotcha. For the compile-time stuff it's all just looking at
> known-at-compile-time sizes. So for something like this, we get a
> __compiletime_warning() emitted:
> 
> 	const char src[] = "Hello there";
> 	char dst[10];
> 
> 	strscpy(dst, src); /* Compiler yells since src is bigger than dst. */
> 
> For run-time checks it's basically just using the regular WARN()
> infrastructure with __builtin_dynamic_object_size(). Here's a simplified
> userspace example with assert():
> 
> https://godbolt.org/z/zMrKnMxn5
> 
> The kernel's FORTIFY_SOURCE is much more complex in how it does the
> checking, how it does the reporting (for helping people figure out what's
> gone weird), etc.

Thanks, I'll go have a look at that.

Martin Uecker June 15, 2024, 4:09 p.m. UTC | #11

Am Freitag, dem 14.06.2024 um 12:17 +0200 schrieb Peter Zijlstra:
> On Wed, Jun 12, 2024 at 04:23:31PM -0700, Kees Cook wrote:
> > On Thu, Jun 13, 2024 at 12:08:21AM +0200, Peter Zijlstra wrote:
> > > On Wed, Jun 12, 2024 at 12:01:19PM -0700, Kees Cook wrote:
> > > > I'm happy to take patches. And for this bikeshed, this would be better
> > > > named under the size_*() helpers which are trying to keep size_t
> > > > calculations from overflowing (by saturating). i.e.:
> > > > 
> > > > 	size_add_mult(sizeof(*p), sizeof(*p->member), num)
> > > 
> > > Fine I suppose, but what if we want something not size_t? Are we waiting
> > > for the type system extension?
> > 
> > Because of C's implicit promotion/truncation, we can't do anything
> > sanely with return values of arbitrary type size; we have to capture the
> > lvalue type somehow so the checking can happen without C doing silent
> > garbage.

What is the specific problem here?

> 
> So sizeof() returns the native (built-in) size_t, right? If that type
> the nooverflow qualifier on, then:
> 
> 	sizeof(*p) + num*sizeof(p->foo[0])
> 
> should all get the nooverflow semantics right? Because size_t is
> effectively 'nooverflow unsigned long' the multiplication should promote
> 'num' to some 'long'.
> 
> Now, I've re-read the rules and I don't see qualifiers mentioned, so
> can't we state that the overflow/nooverflow qualifiers are to be
> preserved on (implicit) promotion and when nooverflow and overflow are
> combined the 'safe' nooverflow takes precedence?
> 
> I mean, when we're adding qualifiers we can make up rules about them
> too, right?

It should probably be a type attribute.

> 
> If 'people' don't want to adorn the built-in size_t, we can always do
> something like:
> 
> #define sizeof(x) ((nooverflow unsigned long)(sizeof(x)))
> 
> and 'fix' it ourselves.

This is likely a stupid question, but making it signed 
wouldn't work?   Or is a signed size_t too small  
or some architectures? Or would this change break too much?


Martin

> 
> > > But none of that is showing me generated asm for the various cases. As
> > > such, I don't consider myself informed enough.
> > 
> > Gotcha. For the compile-time stuff it's all just looking at
> > known-at-compile-time sizes. So for something like this, we get a
> > __compiletime_warning() emitted:
> > 
> > 	const char src[] = "Hello there";
> > 	char dst[10];
> > 
> > 	strscpy(dst, src); /* Compiler yells since src is bigger than dst. */
> > 
> > For run-time checks it's basically just using the regular WARN()
> > infrastructure with __builtin_dynamic_object_size(). Here's a simplified
> > userspace example with assert():
> > 
> > https://godbolt.org/z/zMrKnMxn5
> > 
> > The kernel's FORTIFY_SOURCE is much more complex in how it does the
> > checking, how it does the reporting (for helping people figure out what's
> > gone weird), etc.
> 
> Thanks, I'll go have a look at that.

Kees Cook June 17, 2024, 5:19 p.m. UTC | #12

On Fri, Jun 14, 2024 at 12:17:08PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 12, 2024 at 04:23:31PM -0700, Kees Cook wrote:
> > On Thu, Jun 13, 2024 at 12:08:21AM +0200, Peter Zijlstra wrote:
> > > On Wed, Jun 12, 2024 at 12:01:19PM -0700, Kees Cook wrote:
> > > > I'm happy to take patches. And for this bikeshed, this would be better
> > > > named under the size_*() helpers which are trying to keep size_t
> > > > calculations from overflowing (by saturating). i.e.:
> > > > 
> > > > 	size_add_mult(sizeof(*p), sizeof(*p->member), num)
> > > 
> > > Fine I suppose, but what if we want something not size_t? Are we waiting
> > > for the type system extension?
> > 
> > Because of C's implicit promotion/truncation, we can't do anything
> > sanely with return values of arbitrary type size; we have to capture the
> > lvalue type somehow so the checking can happen without C doing silent
> > garbage.
> 
> So sizeof() returns the native (built-in) size_t, right? If that type
> the nooverflow qualifier on, then:
> 
> 	sizeof(*p) + num*sizeof(p->foo[0])
> 
> should all get the nooverflow semantics right? Because size_t is
> effectively 'nooverflow unsigned long' the multiplication should promote
> 'num' to some 'long'.

Hmmm. This is an interesting point. I'll see what Justin has found as
he's been working on limiting the overflow sanitizer to specific types.

It doesn't help this (unfortunately common) code pattern, though:

	int size;

	size = sizeof(*p) + num*sizeof(p->foo[0]);
	p = kmalloc(size, GFP_KERNEL);

But that was going to be a problem either way.

> Now, I've re-read the rules and I don't see qualifiers mentioned, so
> can't we state that the overflow/nooverflow qualifiers are to be
> preserved on (implicit) promotion and when nooverflow and overflow are
> combined the 'safe' nooverflow takes precedence?
> 
> I mean, when we're adding qualifiers we can make up rules about them
> too, right?

Yup, that is the design of the "wraps" attribute (though it is the
reverse: it _allows_ for wrap-around, since we want to the default state
to be mitigation).

> If 'people' don't want to adorn the built-in size_t, we can always do
> something like:
> 
> #define sizeof(x) ((nooverflow unsigned long)(sizeof(x)))
> 
> and 'fix' it ourselves.

Right, though my hope is still we get the result of "nooverflow" by
marking that which was expected to overflow.

Kees Cook June 17, 2024, 5:28 p.m. UTC | #13

On Sat, Jun 15, 2024 at 06:09:07PM +0200, Martin Uecker wrote:
> Am Freitag, dem 14.06.2024 um 12:17 +0200 schrieb Peter Zijlstra:
> > On Wed, Jun 12, 2024 at 04:23:31PM -0700, Kees Cook wrote:
> > > On Thu, Jun 13, 2024 at 12:08:21AM +0200, Peter Zijlstra wrote:
> > > > On Wed, Jun 12, 2024 at 12:01:19PM -0700, Kees Cook wrote:
> > > > > I'm happy to take patches. And for this bikeshed, this would be better
> > > > > named under the size_*() helpers which are trying to keep size_t
> > > > > calculations from overflowing (by saturating). i.e.:
> > > > > 
> > > > > 	size_add_mult(sizeof(*p), sizeof(*p->member), num)
> > > > 
> > > > Fine I suppose, but what if we want something not size_t? Are we waiting
> > > > for the type system extension?
> > > 
> > > Because of C's implicit promotion/truncation, we can't do anything
> > > sanely with return values of arbitrary type size; we have to capture the
> > > lvalue type somehow so the checking can happen without C doing silent
> > > garbage.
> 
> What is the specific problem here?

This particular complaint is about not being able to "discover" the
lvalue type during an assignment, which means we can't create helper
macros that Just Work. For example, in:

	void foo(some_type arg);

	foo(add_mul(base, size, count));

There is no way to write "add_mul" so that it is checking for overflow
of the "some_type" type. The behavior of such a macro changes if it's
doing u8, int, size_t, etc.

The existing solution to this is to use macros (and builtins) that
cannot be composed:

	add_mul(base, size, count, &result);
	foo(result);

Here, the macro can examine the type of "result" and perform the correct
overflow handling for that type. But lots of developers (rightly)
dislike this style of "assignment side-effect".

Additionally, it can't be composed:

	add(base, mul(size, count));

But, using type attributes we have much more flexibility. Hence, the
proposed "wraps" attribute:
https://github.com/llvm/llvm-project/pull/86618

> This is likely a stupid question, but making it signed 
> wouldn't work?   Or is a signed size_t too small  
> or some architectures? Or would this change break too much?

The ssize_t gets used in some places already, but since size_t is used
for address calculations too, I don't think we can universally switch
it.

Peter Zijlstra June 18, 2024, 8:22 a.m. UTC | #14

On Mon, Jun 17, 2024 at 10:28:20AM -0700, Kees Cook wrote:

> But, using type attributes we have much more flexibility. Hence, the
> proposed "wraps" attribute:
> https://github.com/llvm/llvm-project/pull/86618

So I still think that's going about things backwards.

unsigned explicitly wraps. signed is UB.

When using -fwrapv signed is well defined as 2s complement, which takes
away the UB and makes it implicitly wrap.

When extending the language, it is important to not break existing code,
so the default must remain wrap. This in turn means you need to add a
'nowrap' thingy.

Also, I would very much not make this an attribute, but a full type
qualifier. Furthermore, add type promotion rules to ensure nowrap takes
precedence and is preserved throughout expressions. Such that:

  'long' + 'nowrap int' -> 'nowrap long'

Then, once you have this, you can go do things like:

typedef nowrap unsigned long size_t;
#define sizeof(x) ((size_t)sizeof(x))

and things will just work. Hmm?

Peter Zijlstra June 18, 2024, 8:28 a.m. UTC | #15

On Mon, Jun 17, 2024 at 10:19:15AM -0700, Kees Cook wrote:
> On Fri, Jun 14, 2024 at 12:17:08PM +0200, Peter Zijlstra wrote:
> > On Wed, Jun 12, 2024 at 04:23:31PM -0700, Kees Cook wrote:
> > > On Thu, Jun 13, 2024 at 12:08:21AM +0200, Peter Zijlstra wrote:
> > > > On Wed, Jun 12, 2024 at 12:01:19PM -0700, Kees Cook wrote:
> > > > > I'm happy to take patches. And for this bikeshed, this would be better
> > > > > named under the size_*() helpers which are trying to keep size_t
> > > > > calculations from overflowing (by saturating). i.e.:
> > > > > 
> > > > > 	size_add_mult(sizeof(*p), sizeof(*p->member), num)
> > > > 
> > > > Fine I suppose, but what if we want something not size_t? Are we waiting
> > > > for the type system extension?
> > > 
> > > Because of C's implicit promotion/truncation, we can't do anything
> > > sanely with return values of arbitrary type size; we have to capture the
> > > lvalue type somehow so the checking can happen without C doing silent
> > > garbage.
> > 
> > So sizeof() returns the native (built-in) size_t, right? If that type
> > the nooverflow qualifier on, then:
> > 
> > 	sizeof(*p) + num*sizeof(p->foo[0])
> > 
> > should all get the nooverflow semantics right? Because size_t is
> > effectively 'nooverflow unsigned long' the multiplication should promote
> > 'num' to some 'long'.
> 
> Hmmm. This is an interesting point. I'll see what Justin has found as
> he's been working on limiting the overflow sanitizer to specific types.
> 
> It doesn't help this (unfortunately common) code pattern, though:
> 
> 	int size;
> 
> 	size = sizeof(*p) + num*sizeof(p->foo[0]);
> 	p = kmalloc(size, GFP_KERNEL);
> 
> But that was going to be a problem either way.

Well, you can add a warning on implicitly casting away nooverflow.

New qualifier, we get to make up the rules etc.. it probably means we
need to change the signature of the allocator functions to take a
'nooverflow' type, otherwise those will trigger this new warning, but
that should not be a problem.

> > Now, I've re-read the rules and I don't see qualifiers mentioned, so
> > can't we state that the overflow/nooverflow qualifiers are to be
> > preserved on (implicit) promotion and when nooverflow and overflow are
> > combined the 'safe' nooverflow takes precedence?
> > 
> > I mean, when we're adding qualifiers we can make up rules about them
> > too, right?
> 
> Yup, that is the design of the "wraps" attribute (though it is the
> reverse: it _allows_ for wrap-around, since we want to the default state
> to be mitigation).

Yeah, I feel strongly about that (just mailed you in the other
sub-thread) that this is the wrong way around.

> > If 'people' don't want to adorn the built-in size_t, we can always do
> > something like:
> > 
> > #define sizeof(x) ((nooverflow unsigned long)(sizeof(x)))
> > 
> > and 'fix' it ourselves.
> 
> Right, though my hope is still we get the result of "nooverflow" by
> marking that which was expected to overflow.

You cannot sell that as a proper language extension because it will
break world+dog.

Kees Cook June 20, 2024, 6:26 p.m. UTC | #16

On Tue, Jun 18, 2024 at 10:22:42AM +0200, Peter Zijlstra wrote:
> On Mon, Jun 17, 2024 at 10:28:20AM -0700, Kees Cook wrote:
> 
> > But, using type attributes we have much more flexibility. Hence, the
> > proposed "wraps" attribute:
> > https://github.com/llvm/llvm-project/pull/86618
> 
> So I still think that's going about things backwards.
> 
> unsigned explicitly wraps. signed is UB.
> 
> When using -fwrapv signed is well defined as 2s complement, which takes
> away the UB and makes it implicitly wrap.
> 
> When extending the language, it is important to not break existing code,
> so the default must remain wrap. This in turn means you need to add a
> 'nowrap' thingy.

Well, we still disagree about this, but I guess we'll see how the size_t
work comes along. Maybe I will come to agree with you. :)

> Also, I would very much not make this an attribute, but a full type
> qualifier. Furthermore, add type promotion rules to ensure nowrap takes
> precedence and is preserved throughout expressions. Such that:
> 
>   'long' + 'nowrap int' -> 'nowrap long'
> 
> Then, once you have this, you can go do things like:
> 
> typedef nowrap unsigned long size_t;
> #define sizeof(x) ((size_t)sizeof(x))
> 
> and things will just work. Hmm?

Right. Currently this is being plumbed through the sanitizers, so the
type selection will happen there (for this version of it). The current
experiment is to basically tell the sanitizers to ignore all types
except a given list, and our initial list will be just size_t.

Before this, Justin is finishing up the initial set of idiom exclusions.
.e.g.:

	size_t var, offset;
	...
	if (var + offset < var) { ... }

This will not freak out if "var + offset" wraps.

And "negative unsigned constants" will be ignored:

	#define MASK	-2ULL
	...
	size_t mask = MASK;

And loop post decrements wrapping will get ignored too:

	size_t count = ...;
	...
	while (count--) { ... }

After we get it all nailed down, we'll see if anything new pops up. :)

[v4,0/3] Hardening perf subsystem

Message

Comments