mbox series

[v3,00/11] Introduce Simple atomic counters

Message ID cover.1602209970.git.skhan@linuxfoundation.org (mailing list archive)
Headers show
Series Introduce Simple atomic counters | expand

Message

Shuah Khan Oct. 9, 2020, 3:55 p.m. UTC
This patch series is a result of discussion at the refcount_t BOF
the Linux Plumbers Conference. In this discussion, we identified
a need for looking closely and investigating atomic_t usages in
the kernel when it is used strictly as a counter without it
controlling object lifetimes and state changes.

There are a number of atomic_t usages in the kernel where atomic_t api
is used strictly for counting and not for managing object lifetime. In
some cases, atomic_t might not even be needed.

The purpose of these counters is to clearly differentiate atomic_t
counters from atomic_t usages that guard object lifetimes, hence prone
to overflow and underflow errors. It allows tools that scan for underflow
and overflow on atomic_t usages to detect overflow and underflows to scan
just the cases that are prone to errors.

Simple atomic counters api provides interfaces for simple atomic counters
that just count, and don't guard resource lifetimes. The interfaces are
built on top of atomic_t api, providing a smaller subset of atomic_t
interfaces necessary to support simple counters.
    
Counter wraps around to INT_MIN when it overflows and should not be used
to guard resource lifetimes, device usage and open counts that control
state changes, and pm states. Overflowing to INT_MIN is consistent with
the atomic_t api, which it is built on top of.
    
Using counter_atomic* to guard lifetimes could lead to use-after free
when it overflows and undefined behavior when used to manage state
changes and device usage/open states.

This patch series introduces Simple atomic counters. Counter atomic ops
leverage atomic_t and provide a sub-set of atomic_t ops.

In addition this patch series converts a few drivers to use the new api.
The following criteria is used for select variables for conversion:

1. Variable doesn't guard object lifetimes, manage state changes e.g:
   device usage counts, device open counts, and pm states.
2. Variable is used for stats and counters.
3. The conversion doesn't change the overflow behavior.

Note: Would like to get this into Linux 5.10-rc1 so we can continue
updating drivers that can be updated to use this API. If this all looks
good, Kees, would you like to take this through your tree or would you
like to take this through mine.

Changes since Patch v2:
-- Thanks for reviews and reviewed-by, and Acked-by tags. Updated
   the patches with the tags.
-- Minor changes to address Greg's comment to remove default from
   Kconfig
-- Added Copyrights to new files
Updates to address comments on v2 from Kees Cook
-- Updated Patch 1/11 to make clear that the counter wraps around to
   INT_MIN and that this behavior is consistent with the atomic_t
   api, on which this counter built api built on top of.
-- Other patch change logs updated with the correct wrap around
   behavior.
-- Patch 1/11 is updated to add tests with constants for overflow
   and underflow.
-- Patch 8/11 - added inits for the stat counters
-- Patch 10/11 - fixes the vmci_num_guest_devices != 0 to >0 which is
   safer than checking for !=0. 
 
Changes since Patch v1
-- Thanks for reviews and reviewed-by, and Acked-by tags. Updated
   the patches with the tags.
-- Addressed Kees's  and Joel's comments:
   1. Removed dec_return interfaces
   2. Removed counter_simple interfaces to be added later with changes
      to drivers that use them (if any).

Changes since RFC:
-- Thanks for reviews and reviewed-by, and Acked-by tags. Updated
   the patches with the tags.
-- Addressed Kees's comments:
   1. Non-atomic counters renamed to counter_simple32 and counter_simple64
      to clearly indicate size.
   2. Added warning for counter_simple* usage and it should be used only
      when there is no need for atomicity.
   3. Renamed counter_atomic to counter_atomic32 to clearly indicate size.
   4. Renamed counter_atomic_long to counter_atomic64 and it now uses
      atomic64_t ops and indicates size.
   5. Test updated for the API renames.
   6. Added helper functions for test results printing
   7. Verified that the test module compiles in kunit env. and test
      module can be loaded to run the test.
   8. Updated Documentation to reflect the intent to make the API
      restricted so it can never be used to guard object lifetimes
      and state management. I left _return ops for now, inc_return
      is necessary for now as per the discussion we had on this topic.
-- Updated driver patches with API name changes.
-- We discussed if binder counters can be non-atomic. For now I left
   them the same as the RFC patch - using counter_atomic32
-- Unrelated to this patch series:
   The patch series review uncovered improvements could be made to
   test_async_driver_probe and vmw_vmci/vmci_guest. I will track
   these for fixing later.

Shuah Khan (11):
  counters: Introduce counter_atomic* counters
  selftests:lib:test_counters: add new test for counters
  drivers/base: convert deferred_trigger_count and probe_count to
    counter_atomic32
  drivers/base/devcoredump: convert devcd_count to counter_atomic32
  drivers/acpi: convert seqno counter_atomic32
  drivers/acpi/apei: convert seqno counter_atomic32
  drivers/android/binder: convert stats, transaction_log to
    counter_atomic32
  drivers/base/test/test_async_driver_probe: convert to use
    counter_atomic32
  drivers/char/ipmi: convert stats to use counter_atomic32
  drivers/misc/vmw_vmci: convert num guest devices counter to
    counter_atomic32
  drivers/edac: convert pci counters to counter_atomic32

 Documentation/core-api/counters.rst          | 109 ++++++++++++
 MAINTAINERS                                  |   8 +
 drivers/acpi/acpi_extlog.c                   |   5 +-
 drivers/acpi/apei/ghes.c                     |   5 +-
 drivers/android/binder.c                     |  41 ++---
 drivers/android/binder_internal.h            |   3 +-
 drivers/base/dd.c                            |  19 +-
 drivers/base/devcoredump.c                   |   5 +-
 drivers/base/test/test_async_driver_probe.c  |  26 +--
 drivers/char/ipmi/ipmi_msghandler.c          |   9 +-
 drivers/char/ipmi/ipmi_si_intf.c             |   9 +-
 drivers/edac/edac_pci.h                      |   5 +-
 drivers/edac/edac_pci_sysfs.c                |  28 +--
 drivers/misc/vmw_vmci/vmci_guest.c           |   9 +-
 include/linux/counters.h                     | 176 +++++++++++++++++++
 lib/Kconfig                                  |   9 +
 lib/Makefile                                 |   1 +
 lib/test_counters.c                          | 162 +++++++++++++++++
 tools/testing/selftests/lib/Makefile         |   1 +
 tools/testing/selftests/lib/config           |   1 +
 tools/testing/selftests/lib/test_counters.sh |  10 ++
 21 files changed, 567 insertions(+), 74 deletions(-)
 create mode 100644 Documentation/core-api/counters.rst
 create mode 100644 include/linux/counters.h
 create mode 100644 lib/test_counters.c
 create mode 100755 tools/testing/selftests/lib/test_counters.sh

Comments

Kees Cook Oct. 9, 2020, 6:03 p.m. UTC | #1
On Fri, Oct 09, 2020 at 09:55:55AM -0600, Shuah Khan wrote:
> Note: Would like to get this into Linux 5.10-rc1 so we can continue
> updating drivers that can be updated to use this API. If this all looks
> good, Kees, would you like to take this through your tree or would you
> like to take this through mine.

I'd mentioned this in the v2, but yes, please take via your trees. :)

I'm glad to see this landing!
Shuah Khan Oct. 9, 2020, 7:02 p.m. UTC | #2
On 10/9/20 12:03 PM, Kees Cook wrote:
> On Fri, Oct 09, 2020 at 09:55:55AM -0600, Shuah Khan wrote:
>> Note: Would like to get this into Linux 5.10-rc1 so we can continue
>> updating drivers that can be updated to use this API. If this all looks
>> good, Kees, would you like to take this through your tree or would you
>> like to take this through mine.
> 
> I'd mentioned this in the v2, but yes, please take via your trees. :)
> 

Sorry. I missed that. I will get take this through my trees.

> I'm glad to see this landing!
> 

Thanks for reviews and brainstorming ideas. It has been lot of fun
doing this work.

thanks,
-- Shuah
Peter Zijlstra Oct. 9, 2020, 7:37 p.m. UTC | #3
On Fri, Oct 09, 2020 at 09:55:55AM -0600, Shuah Khan wrote:
> This patch series is a result of discussion at the refcount_t BOF
> the Linux Plumbers Conference. In this discussion, we identified
> a need for looking closely and investigating atomic_t usages in
> the kernel when it is used strictly as a counter without it
> controlling object lifetimes and state changes.
> 
> There are a number of atomic_t usages in the kernel where atomic_t api
> is used strictly for counting and not for managing object lifetime. In
> some cases, atomic_t might not even be needed.

Then the right patch is to not user atomic_t in those cases.

> The purpose of these counters is to clearly differentiate atomic_t
> counters from atomic_t usages that guard object lifetimes, hence prone
> to overflow and underflow errors. It allows tools that scan for underflow
> and overflow on atomic_t usages to detect overflow and underflows to scan
> just the cases that are prone to errors.

Guarding lifetimes is what we got refcount_t for.

> Simple atomic counters api provides interfaces for simple atomic counters
> that just count, and don't guard resource lifetimes. The interfaces are
> built on top of atomic_t api, providing a smaller subset of atomic_t
> interfaces necessary to support simple counters.

To what actual purpose?!? AFACIT its pointless wrappery, it gets us
nothing.

> Counter wraps around to INT_MIN when it overflows and should not be used
> to guard resource lifetimes, device usage and open counts that control
> state changes, and pm states. Overflowing to INT_MIN is consistent with
> the atomic_t api, which it is built on top of.
>     
> Using counter_atomic* to guard lifetimes could lead to use-after free
> when it overflows and undefined behavior when used to manage state
> changes and device usage/open states.
> 
> This patch series introduces Simple atomic counters. Counter atomic ops
> leverage atomic_t and provide a sub-set of atomic_t ops.

Thanks for Cc'ing the atomic maintainers :/

NAK.
Kees Cook Oct. 9, 2020, 8:45 p.m. UTC | #4
On Fri, Oct 09, 2020 at 09:37:46PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 09, 2020 at 09:55:55AM -0600, Shuah Khan wrote:
> > Simple atomic counters api provides interfaces for simple atomic counters
> > that just count, and don't guard resource lifetimes. The interfaces are
> > built on top of atomic_t api, providing a smaller subset of atomic_t
> > interfaces necessary to support simple counters.
> 
> To what actual purpose?!? AFACIT its pointless wrappery, it gets us
> nothing.

It's not pointless. There is value is separating types for behavioral
constraint to avoid flaws. atomic_t provides a native operation. We gained
refcount_t for the "must not wrap" type, and this gets us the other side
of that behavioral type, which is "wrapping is expected". Separating the
atomic_t uses allows for a clearer path to being able to reason about
code flow, whether it be a human or a static analyzer.

The counter wrappers add nothing to the image size, and only serve to
confine the API to one that cannot be used for lifetime management.

Once conversions are done, we have a clean line between refcounting
and statistical atomics, which means we have a much lower chance of
introducing new flaws (and maybe we'll fix flaws during the conversion,
which we've certainly seen before when doing this stricter type/language
changes).

I don't see why this is an objectionable goal.
Peter Zijlstra Oct. 10, 2020, 11:09 a.m. UTC | #5
On Fri, Oct 09, 2020 at 01:45:43PM -0700, Kees Cook wrote:
> On Fri, Oct 09, 2020 at 09:37:46PM +0200, Peter Zijlstra wrote:
> > On Fri, Oct 09, 2020 at 09:55:55AM -0600, Shuah Khan wrote:
> > > Simple atomic counters api provides interfaces for simple atomic counters
> > > that just count, and don't guard resource lifetimes. The interfaces are
> > > built on top of atomic_t api, providing a smaller subset of atomic_t
> > > interfaces necessary to support simple counters.
> > 
> > To what actual purpose?!? AFACIT its pointless wrappery, it gets us
> > nothing.
> 
> It's not pointless. There is value is separating types for behavioral
> constraint to avoid flaws. atomic_t provides a native operation. We gained
> refcount_t for the "must not wrap" type, and this gets us the other side
> of that behavioral type, which is "wrapping is expected". Separating the
> atomic_t uses allows for a clearer path to being able to reason about
> code flow, whether it be a human or a static analyzer.

refcount_t got us actual rutime exceptions that atomic_t doesn't. This
propsal gets us nothing.

atomic_t is very much expected to wrap.

> The counter wrappers add nothing to the image size, and only serve to
> confine the API to one that cannot be used for lifetime management.

It doesn't add anything period. It doesn't get us new behaviour, it
splits a 'can wrap' use-case from a 'can wrap' type. That's sodding
pointless.

Worse, it mixes 2 unrelated cases into one type, which just makes a
mockery of things (all the inc_return users are not statistics, some
might even mis-behave if they wrap).

> Once conversions are done, we have a clean line between refcounting
> and statistical atomics, which means we have a much lower chance of
> introducing new flaws (and maybe we'll fix flaws during the conversion,
> which we've certainly seen before when doing this stricter type/language
> changes).
> 
> I don't see why this is an objectionable goal.

People can and will always find a way to mess things up.

Only add types when you get behavioural changes, otherwise it's
pointless noise.

My NAK stands.
Shuah Khan Oct. 14, 2020, 2:12 a.m. UTC | #6
On 10/10/20 5:09 AM, Peter Zijlstra wrote:
> On Fri, Oct 09, 2020 at 01:45:43PM -0700, Kees Cook wrote:
>> On Fri, Oct 09, 2020 at 09:37:46PM +0200, Peter Zijlstra wrote:
>>> On Fri, Oct 09, 2020 at 09:55:55AM -0600, Shuah Khan wrote:
>>>> Simple atomic counters api provides interfaces for simple atomic counters
>>>> that just count, and don't guard resource lifetimes. The interfaces are
>>>> built on top of atomic_t api, providing a smaller subset of atomic_t
>>>> interfaces necessary to support simple counters.
>>>
>>> To what actual purpose?!? AFACIT its pointless wrappery, it gets us
>>> nothing.
>>
>> It's not pointless. There is value is separating types for behavioral
>> constraint to avoid flaws. atomic_t provides a native operation. We gained
>> refcount_t for the "must not wrap" type, and this gets us the other side
>> of that behavioral type, which is "wrapping is expected". Separating the
>> atomic_t uses allows for a clearer path to being able to reason about
>> code flow, whether it be a human or a static analyzer.
> 
> refcount_t got us actual rutime exceptions that atomic_t doesn't. This
> propsal gets us nothing.
> 
> atomic_t is very much expected to wrap.
> 
>> The counter wrappers add nothing to the image size, and only serve to
>> confine the API to one that cannot be used for lifetime management.
> 
> It doesn't add anything period. It doesn't get us new behaviour, it
> splits a 'can wrap' use-case from a 'can wrap' type. That's sodding
> pointless.
> 

They don't add any new behavior, As Kees mentioned they do give us a
way to clearly differentiate atomic usages that can wrap.

Let's discuss the problem at hand before dismissing it as pointless.

> Worse, it mixes 2 unrelated cases into one type, which just makes a
> mockery of things (all the inc_return users are not statistics, some
> might even mis-behave if they wrap).
> 

You are right that all inc_return usages aren't statistics. There are
3 distinct usages:

1. Stats
2. Cases where wrapping is fine
3. Cases where wrapping could be a problem. In which case, this API
    shouldn't be used.

There is no need to keep inc_return in this API as such. I included it
so it can be used for above cases 1 and 2, so the users don't have to
call inc() followed by read(). It can be left out of the API.

The atomic_t usages in the kernel fall into the following categories:

1. Stats (tolerance for accuracy determines whether they need to be
    atomic or not). RFC version included non-atomic API for cases
    when lossiness is acceptable. All these cases use/need just init
    and inc. There are two variations in this case:

    a. No checks for wrapping. Use signed value.
    b. No checks for wrapping, but return unsigned.

2. Reference counters that release resource and rapping could result
    in use-after-free type problems. There are two variations in this
    case:

    a. Increments and decrements aren't bounded.
    b. Increments and decrements are bounded.

    Currently tools that flag unsafe atomic_t usages that are candidates
    for refcount_t conversions don't make a distinction between the two.

    The second case, since increments and decrements are bounded, it is
    safe to continue to use it. At the moment there is no good way to
    tell them apart other than looking at each of these cases.

3. Reference counters that manage/control states. Wrapping is a problem
    in this case, as it could lead to undefined behavior. These cases
    don't use test and free, use inc/dec. At the moment there is no good
    way to tell them apart other than looking at each of these cases.
    This is addressed by REFCOUNT_SATURATED case.

This API addresses 1a. Stats. No checks for wrapping. Use signed value
at the moment with plan to add support for unsigned for cases where
unsigned is being used.

It is possible to cover 2b in this API, so it becomes easier to make a
clear distinction the two cases and we can focus on only the atomic_t
cases that need to converted to refcount_t. This is easy to do by
allowing max. threshold for the variable and checking against that
and not letting it go above it.

There are several atomic_t usages that use just:

-- init or set and inc
-- init or set and inc/dec (including the ones that manage state)
-- Increments and decrements are bounded

Creating a sub-set of atomic_t api would help us with differentiate
these cases and make it easy for us identify and fix cases where
refcount_t should be used.

Would you be open to considering a subset if it addresses 2b and
unsigned returns for stats?

thanks,
-- Shuah
Peter Zijlstra Oct. 14, 2020, 9:17 a.m. UTC | #7
On Tue, Oct 13, 2020 at 08:12:20PM -0600, Shuah Khan wrote:

> They don't add any new behavior, As Kees mentioned they do give us a
> way to clearly differentiate atomic usages that can wrap.

No it doesn't! atomic_t can wrap, this thing can wrap, no distinction.

All it does is fragment the API and sow confusion. FOR NO BENEFIT.

> > Worse, it mixes 2 unrelated cases into one type, which just makes a
> > mockery of things (all the inc_return users are not statistics, some
> > might even mis-behave if they wrap).
> > 
> 
> You are right that all inc_return usages aren't statistics. There are
> 3 distinct usages:
> 
> 1. Stats
> 2. Cases where wrapping is fine
> 3. Cases where wrapping could be a problem. In which case, this API
>    shouldn't be used.

And yet, afaict patch 4 is case 3...

> There is no need to keep inc_return in this API as such. I included it
> so it can be used for above cases 1 and 2, so the users don't have to
> call inc() followed by read(). It can be left out of the API.
> 
> The atomic_t usages in the kernel fall into the following categories:
> 
> 1. Stats (tolerance for accuracy determines whether they need to be
>    atomic or not). RFC version included non-atomic API for cases
>    when lossiness is acceptable. All these cases use/need just init
>    and inc. There are two variations in this case:
> 
>    a. No checks for wrapping. Use signed value.
>    b. No checks for wrapping, but return unsigned.
> 
> 2. Reference counters that release resource and rapping could result
>    in use-after-free type problems. There are two variations in this
>    case:
> 
>    a. Increments and decrements aren't bounded.
>    b. Increments and decrements are bounded.
> 
>    Currently tools that flag unsafe atomic_t usages that are candidates
>    for refcount_t conversions don't make a distinction between the two.
> 
>    The second case, since increments and decrements are bounded, it is
>    safe to continue to use it. At the moment there is no good way to
>    tell them apart other than looking at each of these cases.
> 
> 3. Reference counters that manage/control states. Wrapping is a problem
>    in this case, as it could lead to undefined behavior. These cases
>    don't use test and free, use inc/dec. At the moment there is no good
>    way to tell them apart other than looking at each of these cases.
>    This is addressed by REFCOUNT_SATURATED case.

Wrong! The atomic usage in mutex doesn't fall in any of those
categories.


The only thing you're all saying that makes sense is that unintentional
wrapping can have bad consequences, the rest is pure confusion.

Focus on the non-wrapping cases, _everything_ else is not going
anywhere.

So audit the kernel, find the cases that should not wrap, categorize and
create APIs for them that trap the wrapping. But don't go around
confusing things that don't need confusion.
Kees Cook Oct. 14, 2020, 11:31 p.m. UTC | #8
On Wed, Oct 14, 2020 at 11:17:20AM +0200, Peter Zijlstra wrote:
> On Tue, Oct 13, 2020 at 08:12:20PM -0600, Shuah Khan wrote:
> 
> > They don't add any new behavior, As Kees mentioned they do give us a
> > way to clearly differentiate atomic usages that can wrap.
> 
> No it doesn't! atomic_t can wrap, this thing can wrap, no distinction.
> 
> All it does is fragment the API and sow confusion. FOR NO BENEFIT.

I really don't see it this way. It's a distinct subset of the atomic_t
API. The trouble that has existed here has been with an atomic_t being
originally used NOT for lifetime management, that mutates into something
like that because of the available API, but doing so without realizing
it. atomic_t gets used for all kinds of algorithms, and the "counter"
type is way too easily accidentally transformed into a "lifetime
tracker" and we get bugs.

If we have a distinct type for wrapping-counters that limits the API,
then it is much harder for folks to shoot themselves in the foot. I don't
see why this is so bad: we end up with safer usage, more easily auditable
code behavior ("how was this atomic_t instance _intended_ to be used?"),
and no change in binary size.

> > There is no need to keep inc_return in this API as such. I included it
> > so it can be used for above cases 1 and 2, so the users don't have to
> > call inc() followed by read(). It can be left out of the API.

I go back and forth on this, but after looking at these instances,
it makes sense to have inc_return(), for where counters are actually
"serial numbers". An argument could be made[1], however, that such uses
should not end up in the position of _reusing_ earlier identifiers, which
means it's actually can't wrap. (And some cases just need u64 to make this
happen[2] -- and in that specific case, don't even need to be atomic_t).

[1] https://lore.kernel.org/lkml/202010071334.8298F3FA7@keescook/
[2] https://git.kernel.org/linus/d1e7fd6462ca9fc76650fbe6ca800e35b24267da

> Wrong! The atomic usage in mutex doesn't fall in any of those
> categories.

But the atomic usage in mutex is *IN* mutex -- it's a separate data
type, etc. We don't build mutexes manually, so why build counters
manually?

> The only thing you're all saying that makes sense is that unintentional
> wrapping can have bad consequences, the rest is pure confusion.
> 
> Focus on the non-wrapping cases, _everything_ else is not going
> anywhere.

I view this as a way to do so: this subset of wrapping cases is being
identified and removed from the pool of all the atomic_t cases so that
they will have been classified, and we can continue to narrow down all
the atomic_t uses to find any potentially mis-used non-wrapping cases.

The other option is adding some kind of attribute to the declarations
(which gets us the annotation) but doesn't provide a limit to the API.
(e.g. no counter should ever call dec_return).

> So audit the kernel, find the cases that should not wrap, categorize and
> create APIs for them that trap the wrapping. But don't go around
> confusing things that don't need confusion.

That's what's happening here. But as it turns out, it's easier to do
this by employing both the process of elimination (mark the counters)
and direct identification (mark the refcount_t). Then the pool of
"unannotated" atomic_t instances continues to shrink.
Peter Zijlstra Oct. 16, 2020, 10:53 a.m. UTC | #9
On Wed, Oct 14, 2020 at 04:31:42PM -0700, Kees Cook wrote:
> On Wed, Oct 14, 2020 at 11:17:20AM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 13, 2020 at 08:12:20PM -0600, Shuah Khan wrote:
> > 
> > > They don't add any new behavior, As Kees mentioned they do give us a
> > > way to clearly differentiate atomic usages that can wrap.
> > 
> > No it doesn't! atomic_t can wrap, this thing can wrap, no distinction.
> > 
> > All it does is fragment the API and sow confusion. FOR NO BENEFIT.
> 
> I really don't see it this way. It's a distinct subset of the atomic_t
> API. The trouble that has existed here has been with an atomic_t being
> originally used NOT for lifetime management, that mutates into something
> like that because of the available API, but doing so without realizing
> it. atomic_t gets used for all kinds of algorithms, and the "counter"
> type is way too easily accidentally transformed into a "lifetime
> tracker" and we get bugs.
> 
> If we have a distinct type for wrapping-counters that limits the API,
> then it is much harder for folks to shoot themselves in the foot. I don't
> see why this is so bad: we end up with safer usage, more easily auditable
> code behavior ("how was this atomic_t instance _intended_ to be used?"),
> and no change in binary size.

I simply do not believe in that. Maybe I've seen too much code like:

  atomic_dec_and_test(&counter->cnt);

or god forbid (thanks Will):

  typedef union { refcount_t ref; counter_atomic_t cnt; atomic_t atm; } my_awesome_type_t;

People simply aren't too fussed. They'll do whatever it takes (to get
the job done with minimal effort).

You think you've cleaned things up, but the moment you turn your back,
it'll decend into madness at break-neck speed. This is part economics
and part the fact that C is crap (see the above two examples).

Can't we use static-analysis/robots for this? The constant feedback from
those is the only thing that helps stem the tide of crap.

Create a variable attribute __wrap, and have the robot complain when
atomic_*_and_test() are used on a variable declared with __wrap. Or
better yet, when a __wrap heads a condition that leads to call_rcu() /
*free*().

> > > There is no need to keep inc_return in this API as such. I included it
> > > so it can be used for above cases 1 and 2, so the users don't have to
> > > call inc() followed by read(). It can be left out of the API.
> 
> I go back and forth on this, but after looking at these instances,
> it makes sense to have inc_return(), for where counters are actually
> "serial numbers". An argument could be made[1], however, that such uses
> should not end up in the position of _reusing_ earlier identifiers, which
> means it's actually can't wrap. (And some cases just need u64 to make this
> happen[2] -- and in that specific case, don't even need to be atomic_t).
> 
> [1] https://lore.kernel.org/lkml/202010071334.8298F3FA7@keescook/
> [2] https://git.kernel.org/linus/d1e7fd6462ca9fc76650fbe6ca800e35b24267da

Sure, there's valid cases where wrapping is ok, but it's a distinct
use-case and really shouldn't be mixed up in the statistics one --
they're distinct.

The fact that patch #4 appears to get this wrong seems like a fairly big
clue.

> > Wrong! The atomic usage in mutex doesn't fall in any of those
> > categories.
> 
> But the atomic usage in mutex is *IN* mutex -- it's a separate data
> type, etc. We don't build mutexes manually, so why build counters
> manually?

The claim was: atomic usage in the kernel falls in these 3 categories.

 - mutex uses atomic (atomic_long_t);
 - mutex is in the kernel;
 - mutex's use of atomic does not fit the listed categories.

Therefore, the claim is false. In fact most of the locking primitives
use atomic one way or another. And yes, some people do build their own.

> > The only thing you're all saying that makes sense is that unintentional
> > wrapping can have bad consequences, the rest is pure confusion.
> > 
> > Focus on the non-wrapping cases, _everything_ else is not going
> > anywhere.
> 
> I view this as a way to do so: this subset of wrapping cases is being
> identified and removed from the pool of all the atomic_t cases so that
> they will have been classified, and we can continue to narrow down all
> the atomic_t uses to find any potentially mis-used non-wrapping cases.
> 
> The other option is adding some kind of attribute to the declarations
> (which gets us the annotation) but doesn't provide a limit to the API.
> (e.g. no counter should ever call dec_return).

See above; robots can do exactly that.

> > So audit the kernel, find the cases that should not wrap, categorize and
> > create APIs for them that trap the wrapping. But don't go around
> > confusing things that don't need confusion.
> 
> That's what's happening here. But as it turns out, it's easier to do
> this by employing both the process of elimination (mark the counters)
> and direct identification (mark the refcount_t). Then the pool of
> "unannotated" atomic_t instances continues to shrink.

That's like saying: "I'm too lazy to track what I've looked at already".
You're basically proposing to graffiti "Kees was here -- 16/10/2020" all
over the kernel. Just so you can see where you still need to go.

It says the code was (assuming your audit was correct) good at that
date, but has no guarantees for any moment after that.
Shuah Khan Oct. 16, 2020, 9:56 p.m. UTC | #10
On 10/14/20 5:31 PM, Kees Cook wrote:
> On Wed, Oct 14, 2020 at 11:17:20AM +0200, Peter Zijlstra wrote:
>> On Tue, Oct 13, 2020 at 08:12:20PM -0600, Shuah Khan wrote:
>>
>>> They don't add any new behavior, As Kees mentioned they do give us a
>>> way to clearly differentiate atomic usages that can wrap.
>>
>> No it doesn't! atomic_t can wrap, this thing can wrap, no distinction.
>>
>> All it does is fragment the API and sow confusion. FOR NO BENEFIT.
> 
> I really don't see it this way. It's a distinct subset of the atomic_t
> API. The trouble that has existed here has been with an atomic_t being
> originally used NOT for lifetime management, that mutates into something
> like that because of the available API, but doing so without realizing
> it. atomic_t gets used for all kinds of algorithms, and the "counter"
> type is way too easily accidentally transformed into a "lifetime
> tracker" and we get bugs.
> 
> If we have a distinct type for wrapping-counters that limits the API,
> then it is much harder for folks to shoot themselves in the foot. I don't
> see why this is so bad: we end up with safer usage, more easily auditable
> code behavior ("how was this atomic_t instance _intended_ to be used?"),
> and no change in binary size.
> 
>>> There is no need to keep inc_return in this API as such. I included it
>>> so it can be used for above cases 1 and 2, so the users don't have to
>>> call inc() followed by read(). It can be left out of the API.
> 
> I go back and forth on this, but after looking at these instances,
> it makes sense to have inc_return(), for where counters are actually
> "serial numbers". An argument could be made[1], however, that such uses
> should not end up in the position of _reusing_ earlier identifiers, which
> means it's actually can't wrap. (And some cases just need u64 to make this
> happen[2] -- and in that specific case, don't even need to be atomic_t).
> 
> [1] https://lore.kernel.org/lkml/202010071334.8298F3FA7@keescook/
> [2] https://git.kernel.org/linus/d1e7fd6462ca9fc76650fbe6ca800e35b24267da
> 
>> Wrong! The atomic usage in mutex doesn't fall in any of those
>> categories.
> 
> But the atomic usage in mutex is *IN* mutex -- it's a separate data
> type, etc. We don't build mutexes manually, so why build counters
> manually?
> 
>> The only thing you're all saying that makes sense is that unintentional
>> wrapping can have bad consequences, the rest is pure confusion.
>>
>> Focus on the non-wrapping cases, _everything_ else is not going
>> anywhere.
> 
> I view this as a way to do so: this subset of wrapping cases is being
> identified and removed from the pool of all the atomic_t cases so that
> they will have been classified, and we can continue to narrow down all
> the atomic_t uses to find any potentially mis-used non-wrapping cases.
> 
> The other option is adding some kind of attribute to the declarations
> (which gets us the annotation) but doesn't provide a limit to the API.
> (e.g. no counter should ever call dec_return).
> 

Not sure about that. We have more than dec_return to deal with. More on
this below.

>> So audit the kernel, find the cases that should not wrap, categorize and
>> create APIs for them that trap the wrapping. But don't go around
>> confusing things that don't need confusion.
> 
> That's what's happening here. But as it turns out, it's easier to do
> this by employing both the process of elimination (mark the counters)
> and direct identification (mark the refcount_t). Then the pool of
> "unannotated" atomic_t instances continues to shrink.
> 

Right auditing is what is happening now.

Let me summarize the discussion:

atomic_t api provides a wide range of atomic operations as a base
api to implement atomic counters, bitops, spinlock interfaces.
The usages also evolved into being used for resource lifetimes and
state management. Then came refcount_t api to address resource lifetime
problems related to atomic_t wrapping.

There is a large overlap between the atomic_t api used for resource
lifetimes and just counters. Not all counters used for resource
lifetimes can be converted to refcount_t.

A few quick "git grep" numbers on atomic_t interfaces usage:

Common for all:

atomic_set() - 3418
atomic_read() - 5833
atomic_inc() - 3376
atomic_dec() - 2498
atomic_inc_return() - 612

Counters don't need these:

atomic_dec_return() - 295
atomic_add_return() - 209
atomic_sub_return() - 144
atomic_add() - 744
atomic_sub() - 371
atomic_dec_and_test() - 552

You can see from these numbers, the volume of common usages that make
it difficult to separate out counters vs. non-counter usages.

The problem we are now running into is, it is becoming difficult
weed out candidates for refcount_t conversion in this noise.

Isolating a smaller subset of arithmetic atomic ops to address this
specific counters use-case will help reduce noise. This way we can
go through this work once and convert all counters to use this narrow
scoped api and what is left is non-counter usages.

The current situation is more confusing and adding a narrowly focused
api for counters reduces it and makes it easier.

thanks,
-- Shuah
Kees Cook Oct. 16, 2020, 10:51 p.m. UTC | #11
On Fri, Oct 16, 2020 at 12:53:13PM +0200, Peter Zijlstra wrote:
> That's like saying: "I'm too lazy to track what I've looked at already".
> You're basically proposing to graffiti "Kees was here -- 16/10/2020" all
> over the kernel. Just so you can see where you still need to go.
> 
> It says the code was (assuming your audit was correct) good at that
> date, but has no guarantees for any moment after that.

That kind of bit-rot marking is exactly what I would like to avoid: just
putting a comment in is pointless. Making the expectations of the usage
become _enforced_ is the goal. And having it enforced by the _compiler_
is key. Just adding a meaningless attribute that a static checker
will notice some time and hope people fix them doesn't scale either
(just look at how many sparse warnings there are). So with C's limits,
the API and type enforcement become the principle way to get this done.

So, if there are behavioral patterns we CAN carve away from atomic_t
cleanly (and I think there are), those are the ones I want to work
towards. The "corner case" uses of atomic_t are much less common than
the "big" code patterns like lifetime management (now delegated to and
enforced by refcount_t). My estimation was that _statistics_ (and not
"serial identifiers") was the next biggest code pattern using atomic_t
that could benefit from having its usage isolated. It is not itself a
dangerous code pattern, but it can mask finding them.

Then, at the end of the day, only the corner cases remain, and those can
be seen clearly as they change over time. Since we can never have a
one-time audit be anything other than advisory, we need to make it EASY
to do those kinds of audits so they can be done regularly.
Dan Carpenter Nov. 10, 2020, 6:49 p.m. UTC | #12
On Fri, Oct 16, 2020 at 03:51:25PM -0700, Kees Cook wrote:
> On Fri, Oct 16, 2020 at 12:53:13PM +0200, Peter Zijlstra wrote:
> > That's like saying: "I'm too lazy to track what I've looked at already".
> > You're basically proposing to graffiti "Kees was here -- 16/10/2020" all
> > over the kernel. Just so you can see where you still need to go.
> > 
> > It says the code was (assuming your audit was correct) good at that
> > date, but has no guarantees for any moment after that.
> 
> That kind of bit-rot marking is exactly what I would like to avoid: just
> putting a comment in is pointless. Making the expectations of the usage
> become _enforced_ is the goal. And having it enforced by the _compiler_
> is key. Just adding a meaningless attribute that a static checker
> will notice some time and hope people fix them doesn't scale either
> (just look at how many sparse warnings there are).

Most Sparse warnings are false positives.  People do actually fix the
ones which matter.

I think this patchset could be useful.  I'm working on a refcounting
check for Smatch.  I want to warn about when we forget to drop a
reference on an error path.  Right now I just assume that anything with
"error", "drop" or "->stats->" in the name is just a counter.

regards,
dan carpenter