diff mbox series

audit: use mmget() instead of get_task_exe_file() when auditing @current

Message ID 20231018222023.371274-2-paul@paul-moore.com (mailing list archive)
State Superseded
Delegated to: Paul Moore
Headers show
Series audit: use mmget() instead of get_task_exe_file() when auditing @current | expand

Commit Message

Paul Moore Oct. 18, 2023, 10:20 p.m. UTC
The get_task_exe_file() function locks the given task with task_lock()
which when used inside audit_exe_compare() can cause deadlocks on
systems that generate audit records when the task_lock() is held.  As
we should be able to safely access current->mm we can drop the call to
get_task_exe_file() and get a reference to current->mm directly which
we can then use in a call to get_mm_exe_file() without having to take
task_lock().

While the audit_exe_compare() function takes an arbitrary task_struct
parameter, all of the callers outside of fork/clone, call
audit_exe_compare() to operate on the current task_struct, making it
the common case.  In the fork/clone case, when audit_exe_compare() is
called on the child task_struct, it should be safe to use the
get_task_mm() function as no other task should be holding task_lock()
at this time.

Cc: <stable@vger.kernel.org>
Fixes: 5efc244346f9 ("audit: fix exe_file access in audit_exe_compare")
Reported-by: Andreas Steinmetz <anstein99@googlemail.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
---
 kernel/audit_watch.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

Comments

Paul Moore Oct. 18, 2023, 10:22 p.m. UTC | #1
Updating Mateusz's email.

On Wed, Oct 18, 2023 at 6:20 PM Paul Moore <paul@paul-moore.com> wrote:
>
> The get_task_exe_file() function locks the given task with task_lock()
> which when used inside audit_exe_compare() can cause deadlocks on
> systems that generate audit records when the task_lock() is held.  As
> we should be able to safely access current->mm we can drop the call to
> get_task_exe_file() and get a reference to current->mm directly which
> we can then use in a call to get_mm_exe_file() without having to take
> task_lock().
>
> While the audit_exe_compare() function takes an arbitrary task_struct
> parameter, all of the callers outside of fork/clone, call
> audit_exe_compare() to operate on the current task_struct, making it
> the common case.  In the fork/clone case, when audit_exe_compare() is
> called on the child task_struct, it should be safe to use the
> get_task_mm() function as no other task should be holding task_lock()
> at this time.
>
> Cc: <stable@vger.kernel.org>
> Fixes: 5efc244346f9 ("audit: fix exe_file access in audit_exe_compare")
> Reported-by: Andreas Steinmetz <anstein99@googlemail.com>
> Signed-off-by: Paul Moore <paul@paul-moore.com>
> ---
>  kernel/audit_watch.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
> index 65075f1e4ac8..fa3e6ea0e58c 100644
> --- a/kernel/audit_watch.c
> +++ b/kernel/audit_watch.c
> @@ -526,8 +526,20 @@ int audit_exe_compare(struct task_struct *tsk, struct audit_fsnotify_mark *mark)
>         struct file *exe_file;
>         unsigned long ino;
>         dev_t dev;
> +       struct mm_struct *mm;
>
> -       exe_file = get_task_exe_file(tsk);
> +       /* almost always operate on current, exceptions for fork/clone */
> +       if (likely(tsk == current)) {
> +               mmget(current->mm);
> +               mm = current->mm;
> +       } else {
> +               /* this can deadlock outside the fork/clone usage (see above) */
> +               mm = get_task_mm(tsk);
> +               if (!mm)
> +                       return 0;
> +       }
> +       exe_file = get_mm_exe_file(mm);
> +       mmput(mm);
>         if (!exe_file)
>                 return 0;
>         ino = file_inode(exe_file)->i_ino;
> --
> 2.42.0
Mateusz Guzik Oct. 19, 2023, 12:22 a.m. UTC | #2
On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
>> The get_task_exe_file() function locks the given task with task_lock()
>> which when used inside audit_exe_compare() can cause deadlocks on
>> systems that generate audit records when the task_lock() is held.  As
>> we should be able to safely access current->mm we can drop the call to
>> get_task_exe_file() and get a reference to current->mm directly which
>> we can then use in a call to get_mm_exe_file() without having to take
>> task_lock().
>>

I don't follow what's the deadlock.

You mean they hold task lock for current and then call into
audit_exe_compare which takes it again?

I would argue any calls into the routine with *any* task already
locked are fishy at best (I'm assuming there is an established task
lock ordering which may be accidentally violated).

>> While the audit_exe_compare() function takes an arbitrary task_struct
>> parameter, all of the callers outside of fork/clone, call
>> audit_exe_compare() to operate on the current task_struct, making it
>> the common case.  In the fork/clone case, when audit_exe_compare() is
>> called on the child task_struct, it should be safe to use the
>> get_task_mm() function as no other task should be holding task_lock()
>> at this time.
>>
>> Cc: <stable@vger.kernel.org>
>> Fixes: 5efc244346f9 ("audit: fix exe_file access in audit_exe_compare")
>> Reported-by: Andreas Steinmetz <anstein99@googlemail.com>
>> Signed-off-by: Paul Moore <paul@paul-moore.com>
>> ---
>>  kernel/audit_watch.c | 14 +++++++++++++-
>>  1 file changed, 13 insertions(+), 1 deletion(-)
>>
>> diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
>> index 65075f1e4ac8..fa3e6ea0e58c 100644
>> --- a/kernel/audit_watch.c
>> +++ b/kernel/audit_watch.c
>> @@ -526,8 +526,20 @@ int audit_exe_compare(struct task_struct *tsk, struct
>> audit_fsnotify_mark *mark)
>>         struct file *exe_file;
>>         unsigned long ino;
>>         dev_t dev;
>> +       struct mm_struct *mm;
>>
>> -       exe_file = get_task_exe_file(tsk);
>> +       /* almost always operate on current, exceptions for fork/clone */
>> +       if (likely(tsk == current)) {
>> +               mmget(current->mm);
>> +               mm = current->mm;
>> +       } else {
>> +               /* this can deadlock outside the fork/clone usage (see
>> above) */
>> +               mm = get_task_mm(tsk);
>> +               if (!mm)
>> +                       return 0;
>> +       }
>> +       exe_file = get_mm_exe_file(mm);
>> +       mmput(mm);
>>         if (!exe_file)
>>                 return 0;
>>         ino = file_inode(exe_file)->i_ino;

Assuming concerns expressed above are moot, I would hide the logic in
audit_get_task_exe_file or similar.
Paul Moore Oct. 19, 2023, 1:29 a.m. UTC | #3
On Wed, Oct 18, 2023 at 8:22 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
> >> The get_task_exe_file() function locks the given task with task_lock()
> >> which when used inside audit_exe_compare() can cause deadlocks on
> >> systems that generate audit records when the task_lock() is held.  As
> >> we should be able to safely access current->mm we can drop the call to
> >> get_task_exe_file() and get a reference to current->mm directly which
> >> we can then use in a call to get_mm_exe_file() without having to take
> >> task_lock().
>
> I don't follow what's the deadlock.
>
> You mean they hold task lock for current and then call into
> audit_exe_compare which takes it again?
>
> I would argue any calls into the routine with *any* task already
> locked are fishy at best (I'm assuming there is an established task
> lock ordering which may be accidentally violated).

A link to the original problem report is below, but the quick summary
is a capable() check in the do_prlimit() code path.

https://lore.kernel.org/audit/668dd928-b00d-4f7a-5e6a-b6efc6a9c08f@canonical.com

> >> While the audit_exe_compare() function takes an arbitrary task_struct
> >> parameter, all of the callers outside of fork/clone, call
> >> audit_exe_compare() to operate on the current task_struct, making it
> >> the common case.  In the fork/clone case, when audit_exe_compare() is
> >> called on the child task_struct, it should be safe to use the
> >> get_task_mm() function as no other task should be holding task_lock()
> >> at this time.
> >>
> >> Cc: <stable@vger.kernel.org>
> >> Fixes: 5efc244346f9 ("audit: fix exe_file access in audit_exe_compare")
> >> Reported-by: Andreas Steinmetz <anstein99@googlemail.com>
> >> Signed-off-by: Paul Moore <paul@paul-moore.com>
> >> ---
> >>  kernel/audit_watch.c | 14 +++++++++++++-
> >>  1 file changed, 13 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
> >> index 65075f1e4ac8..fa3e6ea0e58c 100644
> >> --- a/kernel/audit_watch.c
> >> +++ b/kernel/audit_watch.c
> >> @@ -526,8 +526,20 @@ int audit_exe_compare(struct task_struct *tsk, struct
> >> audit_fsnotify_mark *mark)
> >>         struct file *exe_file;
> >>         unsigned long ino;
> >>         dev_t dev;
> >> +       struct mm_struct *mm;
> >>
> >> -       exe_file = get_task_exe_file(tsk);
> >> +       /* almost always operate on current, exceptions for fork/clone */
> >> +       if (likely(tsk == current)) {
> >> +               mmget(current->mm);
> >> +               mm = current->mm;
> >> +       } else {
> >> +               /* this can deadlock outside the fork/clone usage (see
> >> above) */
> >> +               mm = get_task_mm(tsk);
> >> +               if (!mm)
> >> +                       return 0;
> >> +       }
> >> +       exe_file = get_mm_exe_file(mm);
> >> +       mmput(mm);
> >>         if (!exe_file)
> >>                 return 0;
> >>         ino = file_inode(exe_file)->i_ino;
>
> Assuming concerns expressed above are moot, I would hide the logic in
> audit_get_task_exe_file or similar.

If there was any other caller I would agree with you, but
audit_exe_compare() is the only place where we need this logic and
considering the size of the function it seems silly to break out the
mm/exe_file logic into a separate function.
Mateusz Guzik Oct. 19, 2023, 2:14 a.m. UTC | #4
On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
> On Wed, Oct 18, 2023 at 8:22 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
>> >> The get_task_exe_file() function locks the given task with task_lock()
>> >> which when used inside audit_exe_compare() can cause deadlocks on
>> >> systems that generate audit records when the task_lock() is held.  As
>> >> we should be able to safely access current->mm we can drop the call to
>> >> get_task_exe_file() and get a reference to current->mm directly which
>> >> we can then use in a call to get_mm_exe_file() without having to take
>> >> task_lock().
>>
>> I don't follow what's the deadlock.
>>
>> You mean they hold task lock for current and then call into
>> audit_exe_compare which takes it again?
>>
>> I would argue any calls into the routine with *any* task already
>> locked are fishy at best (I'm assuming there is an established task
>> lock ordering which may be accidentally violated).
>
> A link to the original problem report is below, but the quick summary
> is a capable() check in the do_prlimit() code path.
>
> https://lore.kernel.org/audit/668dd928-b00d-4f7a-5e6a-b6efc6a9c08f@canonical.com
>

While get_task_exe_file working without taking the task lock would be
nice, I think the *real* bug is do_prlimit calling into
capabilities/security machinery with said lock held to begin with.

Taking 2 task locks taken in an arbitrary order sounds incredibly
dodgy and is probably already illegal. Even with the proposed patch it
would happen with do_prlimit invoked by non-group leader.

With that assumption, if one tries to argue that calling into
capabilities() and security_* hooks with a task lock is legal, then
none of the code inside can take the lock, which sounds like an
unsustainable limitation (even if it so happens that audit calling
into get_task_exe_file is the only case at the moment).

All that said I think the cleanest way forward is to add a dedicated
spinlock for rlimits and stuff it into signal_struct. I'm going to
hack it up tomorrow, but some more people will need to be added to the
discussion.
Mateusz Guzik Oct. 19, 2023, 2:18 a.m. UTC | #5
On 10/19/23, Mateusz Guzik <mjguzik@gmail.com> wrote:
> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
>> On Wed, Oct 18, 2023 at 8:22 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>>> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
>>> >> The get_task_exe_file() function locks the given task with
>>> >> task_lock()
>>> >> which when used inside audit_exe_compare() can cause deadlocks on
>>> >> systems that generate audit records when the task_lock() is held.  As
>>> >> we should be able to safely access current->mm we can drop the call
>>> >> to
>>> >> get_task_exe_file() and get a reference to current->mm directly which
>>> >> we can then use in a call to get_mm_exe_file() without having to take
>>> >> task_lock().
>>>
>>> I don't follow what's the deadlock.
>>>
>>> You mean they hold task lock for current and then call into
>>> audit_exe_compare which takes it again?
>>>
>>> I would argue any calls into the routine with *any* task already
>>> locked are fishy at best (I'm assuming there is an established task
>>> lock ordering which may be accidentally violated).
>>
>> A link to the original problem report is below, but the quick summary
>> is a capable() check in the do_prlimit() code path.
>>
>> https://lore.kernel.org/audit/668dd928-b00d-4f7a-5e6a-b6efc6a9c08f@canonical.com
>>
>
> While get_task_exe_file working without taking the task lock would be
> nice, I think the *real* bug is do_prlimit calling into
> capabilities/security machinery with said lock held to begin with.
>
> Taking 2 task locks taken in an arbitrary order sounds incredibly
> dodgy and is probably already illegal. Even with the proposed patch it
> would happen with do_prlimit invoked by non-group leader.
>
> With that assumption, if one tries to argue that calling into
> capabilities() and security_* hooks with a task lock is legal, then
> none of the code inside can take the lock, which sounds like an
> unsustainable limitation (even if it so happens that audit calling
> into get_task_exe_file is the only case at the moment).
>
> All that said I think the cleanest way forward is to add a dedicated
> spinlock for rlimits and stuff it into signal_struct. I'm going to
> hack it up tomorrow, but some more people will need to be added to the
> discussion.
>

sigh, now that I wrote I remembered prlimit operates on arbitrary
threads to begin with.

I'm going to have to sleep on it.
Paul Moore Oct. 19, 2023, 2:42 p.m. UTC | #6
On Wed, Oct 18, 2023 at 10:14 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
> > On Wed, Oct 18, 2023 at 8:22 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> >> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
> >> >> The get_task_exe_file() function locks the given task with task_lock()
> >> >> which when used inside audit_exe_compare() can cause deadlocks on
> >> >> systems that generate audit records when the task_lock() is held.  As
> >> >> we should be able to safely access current->mm we can drop the call to
> >> >> get_task_exe_file() and get a reference to current->mm directly which
> >> >> we can then use in a call to get_mm_exe_file() without having to take
> >> >> task_lock().
> >>
> >> I don't follow what's the deadlock.
> >>
> >> You mean they hold task lock for current and then call into
> >> audit_exe_compare which takes it again?
> >>
> >> I would argue any calls into the routine with *any* task already
> >> locked are fishy at best (I'm assuming there is an established task
> >> lock ordering which may be accidentally violated).
> >
> > A link to the original problem report is below, but the quick summary
> > is a capable() check in the do_prlimit() code path.
> >
> > https://lore.kernel.org/audit/668dd928-b00d-4f7a-5e6a-b6efc6a9c08f@canonical.com
>
> While get_task_exe_file working without taking the task lock would be
> nice, I think the *real* bug is do_prlimit calling into
> capabilities/security machinery with said lock held to begin with.

I'm not going to argue about do_prlimit() taking the task_lock(), I
can't say I've thought enough about that, but regardless of what
happens there I think we can fix things on the audit side as well.
Thinking about it a bit more this morning, I think we can safely
ignore the non-@current case in audit_exe_compare() as the whole point
of the audit exe filter is to record the actions of processes
instantiated from that executable file; if the @current task is not
being logged/filtered, we shouldn't have to worry about the exe
filter.

I'll send out a v2 of the patch later with those changes, but like I
said at the top, please don't let that stop you making improvements
elsewhere.
Mateusz Guzik Oct. 19, 2023, 2:52 p.m. UTC | #7
On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
> Thinking about it a bit more this morning, I think we can safely
> ignore the non-@current case in audit_exe_compare() as the whole point
> of the audit exe filter is to record the actions of processes
> instantiated from that executable file; if the @current task is not
> being logged/filtered, we shouldn't have to worry about the exe
> filter.
>

I did a quick stab at figuring out whether one can get there with
non-current to begin with, but did not convince myself it is not
possible.

That said, should you repost, I think refing and unrefing mm should be a voided.

The bug showed up with 18c91bb2d87268d23868bf13508f5bc9cf04e89a
("prlimit: do not grab the tasklist_lock") which converted that lock
to task_lock. So I don't think pointing at my patch as "Fixes" is
accurate, but I'm not going to insist. ;)

As for prlimit, I think I'm going to add the lock to signal struct for
thread within one thread group. At the same time "remote" prlimit will
instead resort to tasklist_lock, like it did prior to the above
commit. Then the new lock can be converted to a seqlock, so that
grabbing rlimits is cheaper. I'm going to create a new thread about it
later.
Paul Moore Oct. 19, 2023, 4:07 p.m. UTC | #8
On Thu, Oct 19, 2023 at 10:52 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
> > Thinking about it a bit more this morning, I think we can safely
> > ignore the non-@current case in audit_exe_compare() as the whole point
> > of the audit exe filter is to record the actions of processes
> > instantiated from that executable file; if the @current task is not
> > being logged/filtered, we shouldn't have to worry about the exe
> > filter.
>
> I did a quick stab at figuring out whether one can get there with
> non-current to begin with, but did not convince myself it is not
> possible.
>
> That said, should you repost, I think refing and unrefing mm should be a voided.

We have to deref current->mm to get the exe_file, but so long as we
get a reference with mmget()/mmput() it should be safe, no?

> The bug showed up with 18c91bb2d87268d23868bf13508f5bc9cf04e89a
> ("prlimit: do not grab the tasklist_lock") which converted that lock
> to task_lock. So I don't think pointing at my patch as "Fixes" is
> accurate, but I'm not going to insist. ;)

Hmm, 18c91bb2d872 doesn't look like it adds a call to task_lock(), did
you copy-n-past the wrong commit or am I missing something?  From what
I can see, the task_lock() was first introduced back in 2009 with
86f162f4c75c ("rlimits: do security check under task_lock").

--
paul-moore.com
Mateusz Guzik Oct. 19, 2023, 4:55 p.m. UTC | #9
On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
> On Thu, Oct 19, 2023 at 10:52 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
>> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
>> > Thinking about it a bit more this morning, I think we can safely
>> > ignore the non-@current case in audit_exe_compare() as the whole point
>> > of the audit exe filter is to record the actions of processes
>> > instantiated from that executable file; if the @current task is not
>> > being logged/filtered, we shouldn't have to worry about the exe
>> > filter.
>>
>> I did a quick stab at figuring out whether one can get there with
>> non-current to begin with, but did not convince myself it is not
>> possible.
>>
>> That said, should you repost, I think refing and unrefing mm should be a
>> voided.
>
> We have to deref current->mm to get the exe_file, but so long as we
> get a reference with mmget()/mmput() it should be safe, no?
>

For task == current the very condition which allows you to safely
mmget also makes the operation redundant -- current already has a ref
on mm for as long as it executes.

>> The bug showed up with 18c91bb2d87268d23868bf13508f5bc9cf04e89a
>> ("prlimit: do not grab the tasklist_lock") which converted that lock
>> to task_lock. So I don't think pointing at my patch as "Fixes" is
>> accurate, but I'm not going to insist. ;)
>
> Hmm, 18c91bb2d872 doesn't look like it adds a call to task_lock(), did
> you copy-n-past the wrong commit or am I missing something?  From what
> I can see, the task_lock() was first introduced back in 2009 with
> 86f162f4c75c ("rlimits: do security check under task_lock").
>

Huh. That's some brainfarting by me, indeed with this in place it is
my commit which regresses.
Paul Moore Oct. 19, 2023, 6:15 p.m. UTC | #10
On Thu, Oct 19, 2023 at 12:56 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
> > On Thu, Oct 19, 2023 at 10:52 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
> >> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
> >> > Thinking about it a bit more this morning, I think we can safely
> >> > ignore the non-@current case in audit_exe_compare() as the whole point
> >> > of the audit exe filter is to record the actions of processes
> >> > instantiated from that executable file; if the @current task is not
> >> > being logged/filtered, we shouldn't have to worry about the exe
> >> > filter.
> >>
> >> I did a quick stab at figuring out whether one can get there with
> >> non-current to begin with, but did not convince myself it is not
> >> possible.
> >>
> >> That said, should you repost, I think refing and unrefing mm should be a
> >> voided.
> >
> > We have to deref current->mm to get the exe_file, but so long as we
> > get a reference with mmget()/mmput() it should be safe, no?
>
> For task == current the very condition which allows you to safely
> mmget also makes the operation redundant -- current already has a ref
> on mm for as long as it executes.

I've been using the move_pages(2) syscall code as an example and in
the find_mm_struct() function the code either does a mmget() if
accessing current->mm or a get_task_mm() if accessing an arbitrary
task.

  -> SYSCALL_DEFINE6(move_pages, ...)
    -> kernel_move_pages(...)
      -> find_mm_struct(...)

What am I missing?

> >> The bug showed up with 18c91bb2d87268d23868bf13508f5bc9cf04e89a
> >> ("prlimit: do not grab the tasklist_lock") which converted that lock
> >> to task_lock. So I don't think pointing at my patch as "Fixes" is
> >> accurate, but I'm not going to insist. ;)
> >
> > Hmm, 18c91bb2d872 doesn't look like it adds a call to task_lock(), did
> > you copy-n-past the wrong commit or am I missing something?  From what
> > I can see, the task_lock() was first introduced back in 2009 with
> > 86f162f4c75c ("rlimits: do security check under task_lock").
> >
>
> Huh. That's some brainfarting by me, indeed with this in place it is
> my commit which regresses.

No worries, thanks for the confirmation.
Mateusz Guzik Oct. 21, 2023, 1:51 p.m. UTC | #11
On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
> On Thu, Oct 19, 2023 at 12:56 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
>> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
>> > On Thu, Oct 19, 2023 at 10:52 AM Mateusz Guzik <mjguzik@gmail.com>
>> > wrote:
>> >> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
>> >> > Thinking about it a bit more this morning, I think we can safely
>> >> > ignore the non-@current case in audit_exe_compare() as the whole
>> >> > point
>> >> > of the audit exe filter is to record the actions of processes
>> >> > instantiated from that executable file; if the @current task is not
>> >> > being logged/filtered, we shouldn't have to worry about the exe
>> >> > filter.
>> >>
>> >> I did a quick stab at figuring out whether one can get there with
>> >> non-current to begin with, but did not convince myself it is not
>> >> possible.
>> >>
>> >> That said, should you repost, I think refing and unrefing mm should be
>> >> a
>> >> voided.
>> >
>> > We have to deref current->mm to get the exe_file, but so long as we
>> > get a reference with mmget()/mmput() it should be safe, no?
>>
>> For task == current the very condition which allows you to safely
>> mmget also makes the operation redundant -- current already has a ref
>> on mm for as long as it executes.
>
> I've been using the move_pages(2) syscall code as an example and in
> the find_mm_struct() function the code either does a mmget() if
> accessing current->mm or a get_task_mm() if accessing an arbitrary
> task.
>
>   -> SYSCALL_DEFINE6(move_pages, ...)
>     -> kernel_move_pages(...)
>       -> find_mm_struct(...)
>
> What am I missing?
>

To my reading they did not want to special-case it in the caller --
they just mmput regardless of what task is.

Again, you may notice there are no locks held in that usecase. The
pointer is blindly derefed and a refcount bumped. For this to be safe,
it has to be guaranteed > 0 already, which it is for a task accessing
it's own mm struct. You can find the releasing mmput in exit ->
do_exit -> exit_mm callchain.
Paul Moore Oct. 24, 2023, 3:44 p.m. UTC | #12
On Sat, Oct 21, 2023 at 9:51 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
> > On Thu, Oct 19, 2023 at 12:56 PM Mateusz Guzik <mjguzik@gmail.com> wrote:
> >> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
> >> > On Thu, Oct 19, 2023 at 10:52 AM Mateusz Guzik <mjguzik@gmail.com>
> >> > wrote:
> >> >> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote:
> >> >> > Thinking about it a bit more this morning, I think we can safely
> >> >> > ignore the non-@current case in audit_exe_compare() as the whole
> >> >> > point
> >> >> > of the audit exe filter is to record the actions of processes
> >> >> > instantiated from that executable file; if the @current task is not
> >> >> > being logged/filtered, we shouldn't have to worry about the exe
> >> >> > filter.
> >> >>
> >> >> I did a quick stab at figuring out whether one can get there with
> >> >> non-current to begin with, but did not convince myself it is not
> >> >> possible.
> >> >>
> >> >> That said, should you repost, I think refing and unrefing mm should be
> >> >> a
> >> >> voided.
> >> >
> >> > We have to deref current->mm to get the exe_file, but so long as we
> >> > get a reference with mmget()/mmput() it should be safe, no?
> >>
> >> For task == current the very condition which allows you to safely
> >> mmget also makes the operation redundant -- current already has a ref
> >> on mm for as long as it executes.
> >
> > I've been using the move_pages(2) syscall code as an example and in
> > the find_mm_struct() function the code either does a mmget() if
> > accessing current->mm or a get_task_mm() if accessing an arbitrary
> > task.
> >
> >   -> SYSCALL_DEFINE6(move_pages, ...)
> >     -> kernel_move_pages(...)
> >       -> find_mm_struct(...)
> >
> > What am I missing?
> >
>
> To my reading they did not want to special-case it in the caller --
> they just mmput regardless of what task is.
>
> Again, you may notice there are no locks held in that usecase. The
> pointer is blindly derefed and a refcount bumped. For this to be safe,
> it has to be guaranteed > 0 already, which it is for a task accessing
> it's own mm struct. You can find the releasing mmput in exit ->
> do_exit -> exit_mm callchain.

I'll have a v2 out soon (later today?).
diff mbox series

Patch

diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 65075f1e4ac8..fa3e6ea0e58c 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -526,8 +526,20 @@  int audit_exe_compare(struct task_struct *tsk, struct audit_fsnotify_mark *mark)
 	struct file *exe_file;
 	unsigned long ino;
 	dev_t dev;
+	struct mm_struct *mm;
 
-	exe_file = get_task_exe_file(tsk);
+	/* almost always operate on current, exceptions for fork/clone */
+	if (likely(tsk == current)) {
+		mmget(current->mm);
+		mm = current->mm;
+	} else {
+		/* this can deadlock outside the fork/clone usage (see above) */
+		mm = get_task_mm(tsk);
+		if (!mm)
+			return 0;
+	}
+	exe_file = get_mm_exe_file(mm);
+	mmput(mm);
 	if (!exe_file)
 		return 0;
 	ino = file_inode(exe_file)->i_ino;