Message ID | 20231018222023.371274-2-paul@paul-moore.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Paul Moore |
Headers | show |
Series | audit: use mmget() instead of get_task_exe_file() when auditing @current | expand |
Updating Mateusz's email. On Wed, Oct 18, 2023 at 6:20 PM Paul Moore <paul@paul-moore.com> wrote: > > The get_task_exe_file() function locks the given task with task_lock() > which when used inside audit_exe_compare() can cause deadlocks on > systems that generate audit records when the task_lock() is held. As > we should be able to safely access current->mm we can drop the call to > get_task_exe_file() and get a reference to current->mm directly which > we can then use in a call to get_mm_exe_file() without having to take > task_lock(). > > While the audit_exe_compare() function takes an arbitrary task_struct > parameter, all of the callers outside of fork/clone, call > audit_exe_compare() to operate on the current task_struct, making it > the common case. In the fork/clone case, when audit_exe_compare() is > called on the child task_struct, it should be safe to use the > get_task_mm() function as no other task should be holding task_lock() > at this time. > > Cc: <stable@vger.kernel.org> > Fixes: 5efc244346f9 ("audit: fix exe_file access in audit_exe_compare") > Reported-by: Andreas Steinmetz <anstein99@googlemail.com> > Signed-off-by: Paul Moore <paul@paul-moore.com> > --- > kernel/audit_watch.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c > index 65075f1e4ac8..fa3e6ea0e58c 100644 > --- a/kernel/audit_watch.c > +++ b/kernel/audit_watch.c > @@ -526,8 +526,20 @@ int audit_exe_compare(struct task_struct *tsk, struct audit_fsnotify_mark *mark) > struct file *exe_file; > unsigned long ino; > dev_t dev; > + struct mm_struct *mm; > > - exe_file = get_task_exe_file(tsk); > + /* almost always operate on current, exceptions for fork/clone */ > + if (likely(tsk == current)) { > + mmget(current->mm); > + mm = current->mm; > + } else { > + /* this can deadlock outside the fork/clone usage (see above) */ > + mm = get_task_mm(tsk); > + if (!mm) > + return 0; > + } > + exe_file = get_mm_exe_file(mm); > + mmput(mm); > if (!exe_file) > return 0; > ino = file_inode(exe_file)->i_ino; > -- > 2.42.0
On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: >> The get_task_exe_file() function locks the given task with task_lock() >> which when used inside audit_exe_compare() can cause deadlocks on >> systems that generate audit records when the task_lock() is held. As >> we should be able to safely access current->mm we can drop the call to >> get_task_exe_file() and get a reference to current->mm directly which >> we can then use in a call to get_mm_exe_file() without having to take >> task_lock(). >> I don't follow what's the deadlock. You mean they hold task lock for current and then call into audit_exe_compare which takes it again? I would argue any calls into the routine with *any* task already locked are fishy at best (I'm assuming there is an established task lock ordering which may be accidentally violated). >> While the audit_exe_compare() function takes an arbitrary task_struct >> parameter, all of the callers outside of fork/clone, call >> audit_exe_compare() to operate on the current task_struct, making it >> the common case. In the fork/clone case, when audit_exe_compare() is >> called on the child task_struct, it should be safe to use the >> get_task_mm() function as no other task should be holding task_lock() >> at this time. >> >> Cc: <stable@vger.kernel.org> >> Fixes: 5efc244346f9 ("audit: fix exe_file access in audit_exe_compare") >> Reported-by: Andreas Steinmetz <anstein99@googlemail.com> >> Signed-off-by: Paul Moore <paul@paul-moore.com> >> --- >> kernel/audit_watch.c | 14 +++++++++++++- >> 1 file changed, 13 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c >> index 65075f1e4ac8..fa3e6ea0e58c 100644 >> --- a/kernel/audit_watch.c >> +++ b/kernel/audit_watch.c >> @@ -526,8 +526,20 @@ int audit_exe_compare(struct task_struct *tsk, struct >> audit_fsnotify_mark *mark) >> struct file *exe_file; >> unsigned long ino; >> dev_t dev; >> + struct mm_struct *mm; >> >> - exe_file = get_task_exe_file(tsk); >> + /* almost always operate on current, exceptions for fork/clone */ >> + if (likely(tsk == current)) { >> + mmget(current->mm); >> + mm = current->mm; >> + } else { >> + /* this can deadlock outside the fork/clone usage (see >> above) */ >> + mm = get_task_mm(tsk); >> + if (!mm) >> + return 0; >> + } >> + exe_file = get_mm_exe_file(mm); >> + mmput(mm); >> if (!exe_file) >> return 0; >> ino = file_inode(exe_file)->i_ino; Assuming concerns expressed above are moot, I would hide the logic in audit_get_task_exe_file or similar.
On Wed, Oct 18, 2023 at 8:22 PM Mateusz Guzik <mjguzik@gmail.com> wrote: > On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: > >> The get_task_exe_file() function locks the given task with task_lock() > >> which when used inside audit_exe_compare() can cause deadlocks on > >> systems that generate audit records when the task_lock() is held. As > >> we should be able to safely access current->mm we can drop the call to > >> get_task_exe_file() and get a reference to current->mm directly which > >> we can then use in a call to get_mm_exe_file() without having to take > >> task_lock(). > > I don't follow what's the deadlock. > > You mean they hold task lock for current and then call into > audit_exe_compare which takes it again? > > I would argue any calls into the routine with *any* task already > locked are fishy at best (I'm assuming there is an established task > lock ordering which may be accidentally violated). A link to the original problem report is below, but the quick summary is a capable() check in the do_prlimit() code path. https://lore.kernel.org/audit/668dd928-b00d-4f7a-5e6a-b6efc6a9c08f@canonical.com > >> While the audit_exe_compare() function takes an arbitrary task_struct > >> parameter, all of the callers outside of fork/clone, call > >> audit_exe_compare() to operate on the current task_struct, making it > >> the common case. In the fork/clone case, when audit_exe_compare() is > >> called on the child task_struct, it should be safe to use the > >> get_task_mm() function as no other task should be holding task_lock() > >> at this time. > >> > >> Cc: <stable@vger.kernel.org> > >> Fixes: 5efc244346f9 ("audit: fix exe_file access in audit_exe_compare") > >> Reported-by: Andreas Steinmetz <anstein99@googlemail.com> > >> Signed-off-by: Paul Moore <paul@paul-moore.com> > >> --- > >> kernel/audit_watch.c | 14 +++++++++++++- > >> 1 file changed, 13 insertions(+), 1 deletion(-) > >> > >> diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c > >> index 65075f1e4ac8..fa3e6ea0e58c 100644 > >> --- a/kernel/audit_watch.c > >> +++ b/kernel/audit_watch.c > >> @@ -526,8 +526,20 @@ int audit_exe_compare(struct task_struct *tsk, struct > >> audit_fsnotify_mark *mark) > >> struct file *exe_file; > >> unsigned long ino; > >> dev_t dev; > >> + struct mm_struct *mm; > >> > >> - exe_file = get_task_exe_file(tsk); > >> + /* almost always operate on current, exceptions for fork/clone */ > >> + if (likely(tsk == current)) { > >> + mmget(current->mm); > >> + mm = current->mm; > >> + } else { > >> + /* this can deadlock outside the fork/clone usage (see > >> above) */ > >> + mm = get_task_mm(tsk); > >> + if (!mm) > >> + return 0; > >> + } > >> + exe_file = get_mm_exe_file(mm); > >> + mmput(mm); > >> if (!exe_file) > >> return 0; > >> ino = file_inode(exe_file)->i_ino; > > Assuming concerns expressed above are moot, I would hide the logic in > audit_get_task_exe_file or similar. If there was any other caller I would agree with you, but audit_exe_compare() is the only place where we need this logic and considering the size of the function it seems silly to break out the mm/exe_file logic into a separate function.
On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: > On Wed, Oct 18, 2023 at 8:22 PM Mateusz Guzik <mjguzik@gmail.com> wrote: >> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: >> >> The get_task_exe_file() function locks the given task with task_lock() >> >> which when used inside audit_exe_compare() can cause deadlocks on >> >> systems that generate audit records when the task_lock() is held. As >> >> we should be able to safely access current->mm we can drop the call to >> >> get_task_exe_file() and get a reference to current->mm directly which >> >> we can then use in a call to get_mm_exe_file() without having to take >> >> task_lock(). >> >> I don't follow what's the deadlock. >> >> You mean they hold task lock for current and then call into >> audit_exe_compare which takes it again? >> >> I would argue any calls into the routine with *any* task already >> locked are fishy at best (I'm assuming there is an established task >> lock ordering which may be accidentally violated). > > A link to the original problem report is below, but the quick summary > is a capable() check in the do_prlimit() code path. > > https://lore.kernel.org/audit/668dd928-b00d-4f7a-5e6a-b6efc6a9c08f@canonical.com > While get_task_exe_file working without taking the task lock would be nice, I think the *real* bug is do_prlimit calling into capabilities/security machinery with said lock held to begin with. Taking 2 task locks taken in an arbitrary order sounds incredibly dodgy and is probably already illegal. Even with the proposed patch it would happen with do_prlimit invoked by non-group leader. With that assumption, if one tries to argue that calling into capabilities() and security_* hooks with a task lock is legal, then none of the code inside can take the lock, which sounds like an unsustainable limitation (even if it so happens that audit calling into get_task_exe_file is the only case at the moment). All that said I think the cleanest way forward is to add a dedicated spinlock for rlimits and stuff it into signal_struct. I'm going to hack it up tomorrow, but some more people will need to be added to the discussion.
On 10/19/23, Mateusz Guzik <mjguzik@gmail.com> wrote: > On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: >> On Wed, Oct 18, 2023 at 8:22 PM Mateusz Guzik <mjguzik@gmail.com> wrote: >>> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: >>> >> The get_task_exe_file() function locks the given task with >>> >> task_lock() >>> >> which when used inside audit_exe_compare() can cause deadlocks on >>> >> systems that generate audit records when the task_lock() is held. As >>> >> we should be able to safely access current->mm we can drop the call >>> >> to >>> >> get_task_exe_file() and get a reference to current->mm directly which >>> >> we can then use in a call to get_mm_exe_file() without having to take >>> >> task_lock(). >>> >>> I don't follow what's the deadlock. >>> >>> You mean they hold task lock for current and then call into >>> audit_exe_compare which takes it again? >>> >>> I would argue any calls into the routine with *any* task already >>> locked are fishy at best (I'm assuming there is an established task >>> lock ordering which may be accidentally violated). >> >> A link to the original problem report is below, but the quick summary >> is a capable() check in the do_prlimit() code path. >> >> https://lore.kernel.org/audit/668dd928-b00d-4f7a-5e6a-b6efc6a9c08f@canonical.com >> > > While get_task_exe_file working without taking the task lock would be > nice, I think the *real* bug is do_prlimit calling into > capabilities/security machinery with said lock held to begin with. > > Taking 2 task locks taken in an arbitrary order sounds incredibly > dodgy and is probably already illegal. Even with the proposed patch it > would happen with do_prlimit invoked by non-group leader. > > With that assumption, if one tries to argue that calling into > capabilities() and security_* hooks with a task lock is legal, then > none of the code inside can take the lock, which sounds like an > unsustainable limitation (even if it so happens that audit calling > into get_task_exe_file is the only case at the moment). > > All that said I think the cleanest way forward is to add a dedicated > spinlock for rlimits and stuff it into signal_struct. I'm going to > hack it up tomorrow, but some more people will need to be added to the > discussion. > sigh, now that I wrote I remembered prlimit operates on arbitrary threads to begin with. I'm going to have to sleep on it.
On Wed, Oct 18, 2023 at 10:14 PM Mateusz Guzik <mjguzik@gmail.com> wrote: > On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: > > On Wed, Oct 18, 2023 at 8:22 PM Mateusz Guzik <mjguzik@gmail.com> wrote: > >> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: > >> >> The get_task_exe_file() function locks the given task with task_lock() > >> >> which when used inside audit_exe_compare() can cause deadlocks on > >> >> systems that generate audit records when the task_lock() is held. As > >> >> we should be able to safely access current->mm we can drop the call to > >> >> get_task_exe_file() and get a reference to current->mm directly which > >> >> we can then use in a call to get_mm_exe_file() without having to take > >> >> task_lock(). > >> > >> I don't follow what's the deadlock. > >> > >> You mean they hold task lock for current and then call into > >> audit_exe_compare which takes it again? > >> > >> I would argue any calls into the routine with *any* task already > >> locked are fishy at best (I'm assuming there is an established task > >> lock ordering which may be accidentally violated). > > > > A link to the original problem report is below, but the quick summary > > is a capable() check in the do_prlimit() code path. > > > > https://lore.kernel.org/audit/668dd928-b00d-4f7a-5e6a-b6efc6a9c08f@canonical.com > > While get_task_exe_file working without taking the task lock would be > nice, I think the *real* bug is do_prlimit calling into > capabilities/security machinery with said lock held to begin with. I'm not going to argue about do_prlimit() taking the task_lock(), I can't say I've thought enough about that, but regardless of what happens there I think we can fix things on the audit side as well. Thinking about it a bit more this morning, I think we can safely ignore the non-@current case in audit_exe_compare() as the whole point of the audit exe filter is to record the actions of processes instantiated from that executable file; if the @current task is not being logged/filtered, we shouldn't have to worry about the exe filter. I'll send out a v2 of the patch later with those changes, but like I said at the top, please don't let that stop you making improvements elsewhere.
On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: > Thinking about it a bit more this morning, I think we can safely > ignore the non-@current case in audit_exe_compare() as the whole point > of the audit exe filter is to record the actions of processes > instantiated from that executable file; if the @current task is not > being logged/filtered, we shouldn't have to worry about the exe > filter. > I did a quick stab at figuring out whether one can get there with non-current to begin with, but did not convince myself it is not possible. That said, should you repost, I think refing and unrefing mm should be a voided. The bug showed up with 18c91bb2d87268d23868bf13508f5bc9cf04e89a ("prlimit: do not grab the tasklist_lock") which converted that lock to task_lock. So I don't think pointing at my patch as "Fixes" is accurate, but I'm not going to insist. ;) As for prlimit, I think I'm going to add the lock to signal struct for thread within one thread group. At the same time "remote" prlimit will instead resort to tasklist_lock, like it did prior to the above commit. Then the new lock can be converted to a seqlock, so that grabbing rlimits is cheaper. I'm going to create a new thread about it later.
On Thu, Oct 19, 2023 at 10:52 AM Mateusz Guzik <mjguzik@gmail.com> wrote: > On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: > > Thinking about it a bit more this morning, I think we can safely > > ignore the non-@current case in audit_exe_compare() as the whole point > > of the audit exe filter is to record the actions of processes > > instantiated from that executable file; if the @current task is not > > being logged/filtered, we shouldn't have to worry about the exe > > filter. > > I did a quick stab at figuring out whether one can get there with > non-current to begin with, but did not convince myself it is not > possible. > > That said, should you repost, I think refing and unrefing mm should be a voided. We have to deref current->mm to get the exe_file, but so long as we get a reference with mmget()/mmput() it should be safe, no? > The bug showed up with 18c91bb2d87268d23868bf13508f5bc9cf04e89a > ("prlimit: do not grab the tasklist_lock") which converted that lock > to task_lock. So I don't think pointing at my patch as "Fixes" is > accurate, but I'm not going to insist. ;) Hmm, 18c91bb2d872 doesn't look like it adds a call to task_lock(), did you copy-n-past the wrong commit or am I missing something? From what I can see, the task_lock() was first introduced back in 2009 with 86f162f4c75c ("rlimits: do security check under task_lock"). -- paul-moore.com
On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: > On Thu, Oct 19, 2023 at 10:52 AM Mateusz Guzik <mjguzik@gmail.com> wrote: >> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: >> > Thinking about it a bit more this morning, I think we can safely >> > ignore the non-@current case in audit_exe_compare() as the whole point >> > of the audit exe filter is to record the actions of processes >> > instantiated from that executable file; if the @current task is not >> > being logged/filtered, we shouldn't have to worry about the exe >> > filter. >> >> I did a quick stab at figuring out whether one can get there with >> non-current to begin with, but did not convince myself it is not >> possible. >> >> That said, should you repost, I think refing and unrefing mm should be a >> voided. > > We have to deref current->mm to get the exe_file, but so long as we > get a reference with mmget()/mmput() it should be safe, no? > For task == current the very condition which allows you to safely mmget also makes the operation redundant -- current already has a ref on mm for as long as it executes. >> The bug showed up with 18c91bb2d87268d23868bf13508f5bc9cf04e89a >> ("prlimit: do not grab the tasklist_lock") which converted that lock >> to task_lock. So I don't think pointing at my patch as "Fixes" is >> accurate, but I'm not going to insist. ;) > > Hmm, 18c91bb2d872 doesn't look like it adds a call to task_lock(), did > you copy-n-past the wrong commit or am I missing something? From what > I can see, the task_lock() was first introduced back in 2009 with > 86f162f4c75c ("rlimits: do security check under task_lock"). > Huh. That's some brainfarting by me, indeed with this in place it is my commit which regresses.
On Thu, Oct 19, 2023 at 12:56 PM Mateusz Guzik <mjguzik@gmail.com> wrote: > On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: > > On Thu, Oct 19, 2023 at 10:52 AM Mateusz Guzik <mjguzik@gmail.com> wrote: > >> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: > >> > Thinking about it a bit more this morning, I think we can safely > >> > ignore the non-@current case in audit_exe_compare() as the whole point > >> > of the audit exe filter is to record the actions of processes > >> > instantiated from that executable file; if the @current task is not > >> > being logged/filtered, we shouldn't have to worry about the exe > >> > filter. > >> > >> I did a quick stab at figuring out whether one can get there with > >> non-current to begin with, but did not convince myself it is not > >> possible. > >> > >> That said, should you repost, I think refing and unrefing mm should be a > >> voided. > > > > We have to deref current->mm to get the exe_file, but so long as we > > get a reference with mmget()/mmput() it should be safe, no? > > For task == current the very condition which allows you to safely > mmget also makes the operation redundant -- current already has a ref > on mm for as long as it executes. I've been using the move_pages(2) syscall code as an example and in the find_mm_struct() function the code either does a mmget() if accessing current->mm or a get_task_mm() if accessing an arbitrary task. -> SYSCALL_DEFINE6(move_pages, ...) -> kernel_move_pages(...) -> find_mm_struct(...) What am I missing? > >> The bug showed up with 18c91bb2d87268d23868bf13508f5bc9cf04e89a > >> ("prlimit: do not grab the tasklist_lock") which converted that lock > >> to task_lock. So I don't think pointing at my patch as "Fixes" is > >> accurate, but I'm not going to insist. ;) > > > > Hmm, 18c91bb2d872 doesn't look like it adds a call to task_lock(), did > > you copy-n-past the wrong commit or am I missing something? From what > > I can see, the task_lock() was first introduced back in 2009 with > > 86f162f4c75c ("rlimits: do security check under task_lock"). > > > > Huh. That's some brainfarting by me, indeed with this in place it is > my commit which regresses. No worries, thanks for the confirmation.
On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: > On Thu, Oct 19, 2023 at 12:56 PM Mateusz Guzik <mjguzik@gmail.com> wrote: >> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: >> > On Thu, Oct 19, 2023 at 10:52 AM Mateusz Guzik <mjguzik@gmail.com> >> > wrote: >> >> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: >> >> > Thinking about it a bit more this morning, I think we can safely >> >> > ignore the non-@current case in audit_exe_compare() as the whole >> >> > point >> >> > of the audit exe filter is to record the actions of processes >> >> > instantiated from that executable file; if the @current task is not >> >> > being logged/filtered, we shouldn't have to worry about the exe >> >> > filter. >> >> >> >> I did a quick stab at figuring out whether one can get there with >> >> non-current to begin with, but did not convince myself it is not >> >> possible. >> >> >> >> That said, should you repost, I think refing and unrefing mm should be >> >> a >> >> voided. >> > >> > We have to deref current->mm to get the exe_file, but so long as we >> > get a reference with mmget()/mmput() it should be safe, no? >> >> For task == current the very condition which allows you to safely >> mmget also makes the operation redundant -- current already has a ref >> on mm for as long as it executes. > > I've been using the move_pages(2) syscall code as an example and in > the find_mm_struct() function the code either does a mmget() if > accessing current->mm or a get_task_mm() if accessing an arbitrary > task. > > -> SYSCALL_DEFINE6(move_pages, ...) > -> kernel_move_pages(...) > -> find_mm_struct(...) > > What am I missing? > To my reading they did not want to special-case it in the caller -- they just mmput regardless of what task is. Again, you may notice there are no locks held in that usecase. The pointer is blindly derefed and a refcount bumped. For this to be safe, it has to be guaranteed > 0 already, which it is for a task accessing it's own mm struct. You can find the releasing mmput in exit -> do_exit -> exit_mm callchain.
On Sat, Oct 21, 2023 at 9:51 AM Mateusz Guzik <mjguzik@gmail.com> wrote: > On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: > > On Thu, Oct 19, 2023 at 12:56 PM Mateusz Guzik <mjguzik@gmail.com> wrote: > >> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: > >> > On Thu, Oct 19, 2023 at 10:52 AM Mateusz Guzik <mjguzik@gmail.com> > >> > wrote: > >> >> On 10/19/23, Paul Moore <paul@paul-moore.com> wrote: > >> >> > Thinking about it a bit more this morning, I think we can safely > >> >> > ignore the non-@current case in audit_exe_compare() as the whole > >> >> > point > >> >> > of the audit exe filter is to record the actions of processes > >> >> > instantiated from that executable file; if the @current task is not > >> >> > being logged/filtered, we shouldn't have to worry about the exe > >> >> > filter. > >> >> > >> >> I did a quick stab at figuring out whether one can get there with > >> >> non-current to begin with, but did not convince myself it is not > >> >> possible. > >> >> > >> >> That said, should you repost, I think refing and unrefing mm should be > >> >> a > >> >> voided. > >> > > >> > We have to deref current->mm to get the exe_file, but so long as we > >> > get a reference with mmget()/mmput() it should be safe, no? > >> > >> For task == current the very condition which allows you to safely > >> mmget also makes the operation redundant -- current already has a ref > >> on mm for as long as it executes. > > > > I've been using the move_pages(2) syscall code as an example and in > > the find_mm_struct() function the code either does a mmget() if > > accessing current->mm or a get_task_mm() if accessing an arbitrary > > task. > > > > -> SYSCALL_DEFINE6(move_pages, ...) > > -> kernel_move_pages(...) > > -> find_mm_struct(...) > > > > What am I missing? > > > > To my reading they did not want to special-case it in the caller -- > they just mmput regardless of what task is. > > Again, you may notice there are no locks held in that usecase. The > pointer is blindly derefed and a refcount bumped. For this to be safe, > it has to be guaranteed > 0 already, which it is for a task accessing > it's own mm struct. You can find the releasing mmput in exit -> > do_exit -> exit_mm callchain. I'll have a v2 out soon (later today?).
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c index 65075f1e4ac8..fa3e6ea0e58c 100644 --- a/kernel/audit_watch.c +++ b/kernel/audit_watch.c @@ -526,8 +526,20 @@ int audit_exe_compare(struct task_struct *tsk, struct audit_fsnotify_mark *mark) struct file *exe_file; unsigned long ino; dev_t dev; + struct mm_struct *mm; - exe_file = get_task_exe_file(tsk); + /* almost always operate on current, exceptions for fork/clone */ + if (likely(tsk == current)) { + mmget(current->mm); + mm = current->mm; + } else { + /* this can deadlock outside the fork/clone usage (see above) */ + mm = get_task_mm(tsk); + if (!mm) + return 0; + } + exe_file = get_mm_exe_file(mm); + mmput(mm); if (!exe_file) return 0; ino = file_inode(exe_file)->i_ino;
The get_task_exe_file() function locks the given task with task_lock() which when used inside audit_exe_compare() can cause deadlocks on systems that generate audit records when the task_lock() is held. As we should be able to safely access current->mm we can drop the call to get_task_exe_file() and get a reference to current->mm directly which we can then use in a call to get_mm_exe_file() without having to take task_lock(). While the audit_exe_compare() function takes an arbitrary task_struct parameter, all of the callers outside of fork/clone, call audit_exe_compare() to operate on the current task_struct, making it the common case. In the fork/clone case, when audit_exe_compare() is called on the child task_struct, it should be safe to use the get_task_mm() function as no other task should be holding task_lock() at this time. Cc: <stable@vger.kernel.org> Fixes: 5efc244346f9 ("audit: fix exe_file access in audit_exe_compare") Reported-by: Andreas Steinmetz <anstein99@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com> --- kernel/audit_watch.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)