[2/3] mm, notifier: Catch sleeping/blocking for !blockable

Message ID	20181122165106.18238-3-daniel.vetter@ffwll.ch (mailing list archive)
State	New, archived
Headers	show Return-Path: <owner-linux-mm@kvack.org> Received-SPF: neutral (google.com: 209.85.220.65 is neither permitted nor denied by best guess record for domain of daniel.vetter@ffwll.ch) client-ip=209.85.220.65; From: Daniel Vetter <daniel.vetter@ffwll.ch> To: LKML <linux-kernel@vger.kernel.org> Cc: Linux MM <linux-mm@kvack.org>, Intel Graphics Development <intel-gfx@lists.freedesktop.org>, DRI Development <dri-devel@lists.freedesktop.org>, Daniel Vetter <daniel.vetter@ffwll.ch>, Andrew Morton <akpm@linux-foundation.org>, Michal Hocko <mhocko@suse.com>, David Rientjes <rientjes@google.com>, =?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>, =?utf-8?b?SsOp?= =?utf-8?b?csO0bWUgR2xpc3Nl?= <jglisse@redhat.com>, Daniel Vetter <daniel.vetter@intel.com> Subject: [PATCH 2/3] mm, notifier: Catch sleeping/blocking for !blockable Date: Thu, 22 Nov 2018 17:51:05 +0100 Message-Id: <20181122165106.18238-3-daniel.vetter@ffwll.ch> In-Reply-To: <20181122165106.18238-1-daniel.vetter@ffwll.ch> References: <20181122165106.18238-1-daniel.vetter@ffwll.ch> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	RFC: mmu notifier debug checks \| expand [0/3] RFC: mmu notifier debug checks [1/3] mm: Check if mmu notifier callbacks are allowed to fail [2/3] mm, notifier: Catch sleeping/blocking for !blockable [3/3] mm, notifier: Add a lockdep map for invalidate_range_start

Daniel Vetter Nov. 22, 2018, 4:51 p.m. UTC

We need to make sure implementations don't cheat and don't have a
possible schedule/blocking point deeply burried where review can't
catch it.

I'm not sure whether this is the best way to make sure all the
might_sleep() callsites trigger, and it's a bit ugly in the code flow.
But it gets the job done.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: linux-mm@kvack.org
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 mm/mmu_notifier.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Christian König Nov. 22, 2018, 6:55 p.m. UTC | #1

Am 22.11.18 um 17:51 schrieb Daniel Vetter:
> We need to make sure implementations don't cheat and don't have a
> possible schedule/blocking point deeply burried where review can't
> catch it.
>
> I'm not sure whether this is the best way to make sure all the
> might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> But it gets the job done.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>   mm/mmu_notifier.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 59e102589a25..4d282cfb296e 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
>   	id = srcu_read_lock(&srcu);
>   	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
>   		if (mn->ops->invalidate_range_start) {
> -			int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> +			int _ret;
> +
> +			if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> +				preempt_disable();
> +			_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> +			if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> +				preempt_enable();

Just for the sake of better documenting this how about adding this to 
include/linux/kernel.h right next to might_sleep():

#define disallow_sleeping_if(cond)    for((cond) ? preempt_disable() : 
(void)0; (cond); preempt_disable())

(Just from the back of my head, might contain peanuts and/or hints of 
errors).

Christian.

>   			if (_ret) {
>   				pr_info("%pS callback failed with %d in %sblockable context.\n",
>   						mn->ops->invalidate_range_start, _ret,

Daniel Vetter Nov. 23, 2018, 8:46 a.m. UTC | #2

On Thu, Nov 22, 2018 at 06:55:17PM +0000, Koenig, Christian wrote:
> Am 22.11.18 um 17:51 schrieb Daniel Vetter:
> > We need to make sure implementations don't cheat and don't have a
> > possible schedule/blocking point deeply burried where review can't
> > catch it.
> >
> > I'm not sure whether this is the best way to make sure all the
> > might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> > But it gets the job done.
> >
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > Cc: linux-mm@kvack.org
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >   mm/mmu_notifier.c | 8 +++++++-
> >   1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > index 59e102589a25..4d282cfb296e 100644
> > --- a/mm/mmu_notifier.c
> > +++ b/mm/mmu_notifier.c
> > @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> >   	id = srcu_read_lock(&srcu);
> >   	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
> >   		if (mn->ops->invalidate_range_start) {
> > -			int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> > +			int _ret;
> > +
> > +			if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> > +				preempt_disable();
> > +			_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> > +			if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> > +				preempt_enable();
> 
> Just for the sake of better documenting this how about adding this to 
> include/linux/kernel.h right next to might_sleep():
> 
> #define disallow_sleeping_if(cond)    for((cond) ? preempt_disable() : 
> (void)0; (cond); preempt_disable())
> 
> (Just from the back of my head, might contain peanuts and/or hints of 
> errors).

I think these magic for blocks aren't used in the kernel. goto breaks
them, and we use goto a lot. I think a disallow/allow_sleep() pair with
the conditional preept_disable/enable() calls would be nice though. I can
do that if the overall idea sticks.
-Daniel

> 
> Christian.
> 
> >   			if (_ret) {
> >   				pr_info("%pS callback failed with %d in %sblockable context.\n",
> >   						mn->ops->invalidate_range_start, _ret,
>

Christian König Nov. 23, 2018, 10:14 a.m. UTC | #3

Am 23.11.18 um 09:46 schrieb Daniel Vetter:
> On Thu, Nov 22, 2018 at 06:55:17PM +0000, Koenig, Christian wrote:
>> Am 22.11.18 um 17:51 schrieb Daniel Vetter:
>>> We need to make sure implementations don't cheat and don't have a
>>> possible schedule/blocking point deeply burried where review can't
>>> catch it.
>>>
>>> I'm not sure whether this is the best way to make sure all the
>>> might_sleep() callsites trigger, and it's a bit ugly in the code flow.
>>> But it gets the job done.
>>>
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: Michal Hocko <mhocko@suse.com>
>>> Cc: David Rientjes <rientjes@google.com>
>>> Cc: "Christian König" <christian.koenig@amd.com>
>>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>>> Cc: "Jérôme Glisse" <jglisse@redhat.com>
>>> Cc: linux-mm@kvack.org
>>> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
>>> ---
>>>    mm/mmu_notifier.c | 8 +++++++-
>>>    1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
>>> index 59e102589a25..4d282cfb296e 100644
>>> --- a/mm/mmu_notifier.c
>>> +++ b/mm/mmu_notifier.c
>>> @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
>>>    	id = srcu_read_lock(&srcu);
>>>    	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
>>>    		if (mn->ops->invalidate_range_start) {
>>> -			int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
>>> +			int _ret;
>>> +
>>> +			if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
>>> +				preempt_disable();
>>> +			_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
>>> +			if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
>>> +				preempt_enable();
>> Just for the sake of better documenting this how about adding this to
>> include/linux/kernel.h right next to might_sleep():
>>
>> #define disallow_sleeping_if(cond)    for((cond) ? preempt_disable() :
>> (void)0; (cond); preempt_disable())
>>
>> (Just from the back of my head, might contain peanuts and/or hints of
>> errors).
> I think these magic for blocks aren't used in the kernel. goto breaks
> them, and we use goto a lot.

Yeah, good argument.

> I think a disallow/allow_sleep() pair with
> the conditional preept_disable/enable() calls would be nice though. I can
> do that if the overall idea sticks.

Sounds like a good idea to me as well.

Christian.

> -Daniel
>
>> Christian.
>>
>>>    			if (_ret) {
>>>    				pr_info("%pS callback failed with %d in %sblockable context.\n",
>>>    						mn->ops->invalidate_range_start, _ret,

Michal Hocko Nov. 23, 2018, 11:12 a.m. UTC | #4

On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
> We need to make sure implementations don't cheat and don't have a
> possible schedule/blocking point deeply burried where review can't
> catch it.
> 
> I'm not sure whether this is the best way to make sure all the
> might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> But it gets the job done.

Yeah, it is quite ugly. Especially because it makes DEBUG config
bahavior much different. So is this really worth it? Has this already
discovered any existing bug?

> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: linux-mm@kvack.org
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  mm/mmu_notifier.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> index 59e102589a25..4d282cfb296e 100644
> --- a/mm/mmu_notifier.c
> +++ b/mm/mmu_notifier.c
> @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
>  	id = srcu_read_lock(&srcu);
>  	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
>  		if (mn->ops->invalidate_range_start) {
> -			int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> +			int _ret;
> +
> +			if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> +				preempt_disable();
> +			_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> +			if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> +				preempt_enable();
>  			if (_ret) {
>  				pr_info("%pS callback failed with %d in %sblockable context.\n",
>  						mn->ops->invalidate_range_start, _ret,
> -- 
> 2.19.1
>

Daniel Vetter Nov. 23, 2018, 12:38 p.m. UTC | #5

On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:
> On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
> > We need to make sure implementations don't cheat and don't have a
> > possible schedule/blocking point deeply burried where review can't
> > catch it.
> > 
> > I'm not sure whether this is the best way to make sure all the
> > might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> > But it gets the job done.
> 
> Yeah, it is quite ugly. Especially because it makes DEBUG config
> bahavior much different. So is this really worth it? Has this already
> discovered any existing bug?

Given that we need an oom trigger to hit this we're not hitting this in CI
(oom is just way to unpredictable to even try). I'd kinda like to also add
some debug interface so I can provoke an oom kill of a specially prepared
process, to make sure we can reliably exercise this path without killing
the kernel accidentally. We do similar tricks for our shrinker already.

There's been patches floating with this kind of bug I think, and the call
chains we're dealing with a fairly deep. I don't trust review to reliably
catch this kind of fail, that's why I'm looking into tools to better
validat this stuff to augment review.

And yes it's ugly :-/

Wrt the behavior difference: I guess we could put another counter into the
task struct, and change might_sleep() to check it. All under
CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable
sideeffect. My worry with that is that people will spot it, and abuse it
in creative ways that do affect semantics. See horrors like
drm_can_sleep() (and I'm sure gfx folks are not the only ones who
seriously lacked taste here).

Up to the experts really how to best paint this shed I think.

Thanks, Daniel

> 
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > Cc: linux-mm@kvack.org
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >  mm/mmu_notifier.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
> > index 59e102589a25..4d282cfb296e 100644
> > --- a/mm/mmu_notifier.c
> > +++ b/mm/mmu_notifier.c
> > @@ -185,7 +185,13 @@ int __mmu_notifier_invalidate_range_start(struct mm_struct *mm,
> >  	id = srcu_read_lock(&srcu);
> >  	hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list, hlist) {
> >  		if (mn->ops->invalidate_range_start) {
> > -			int _ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> > +			int _ret;
> > +
> > +			if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> > +				preempt_disable();
> > +			_ret = mn->ops->invalidate_range_start(mn, mm, start, end, blockable);
> > +			if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP) && !blockable)
> > +				preempt_enable();
> >  			if (_ret) {
> >  				pr_info("%pS callback failed with %d in %sblockable context.\n",
> >  						mn->ops->invalidate_range_start, _ret,
> > -- 
> > 2.19.1
> > 
> 
> -- 
> Michal Hocko
> SUSE Labs

Michal Hocko Nov. 23, 2018, 12:46 p.m. UTC | #6

On Fri 23-11-18 13:38:38, Daniel Vetter wrote:
> On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:
> > On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
> > > We need to make sure implementations don't cheat and don't have a
> > > possible schedule/blocking point deeply burried where review can't
> > > catch it.
> > > 
> > > I'm not sure whether this is the best way to make sure all the
> > > might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> > > But it gets the job done.
> > 
> > Yeah, it is quite ugly. Especially because it makes DEBUG config
> > bahavior much different. So is this really worth it? Has this already
> > discovered any existing bug?
> 
> Given that we need an oom trigger to hit this we're not hitting this in CI
> (oom is just way to unpredictable to even try). I'd kinda like to also add
> some debug interface so I can provoke an oom kill of a specially prepared
> process, to make sure we can reliably exercise this path without killing
> the kernel accidentally. We do similar tricks for our shrinker already.

Create a task with oom_score_adj = 1000 and trigger the oom killer via
sysrq and you should get a predictable oom invocation and execution.

[...]
> Wrt the behavior difference: I guess we could put another counter into the
> task struct, and change might_sleep() to check it. All under
> CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable
> sideeffect. My worry with that is that people will spot it, and abuse it
> in creative ways that do affect semantics. See horrors like
> drm_can_sleep() (and I'm sure gfx folks are not the only ones who
> seriously lacked taste here).
> 
> Up to the experts really how to best paint this shed I think.

Actually I like a way to say non_block_{begin,end} and might_sleep
firing inside that context.

Daniel Vetter Nov. 23, 2018, 1:12 p.m. UTC | #7

On Fri, Nov 23, 2018 at 1:46 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Fri 23-11-18 13:38:38, Daniel Vetter wrote:
> > On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:
> > > On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
> > > > We need to make sure implementations don't cheat and don't have a
> > > > possible schedule/blocking point deeply burried where review can't
> > > > catch it.
> > > >
> > > > I'm not sure whether this is the best way to make sure all the
> > > > might_sleep() callsites trigger, and it's a bit ugly in the code flow.
> > > > But it gets the job done.
> > >
> > > Yeah, it is quite ugly. Especially because it makes DEBUG config
> > > bahavior much different. So is this really worth it? Has this already
> > > discovered any existing bug?
> >
> > Given that we need an oom trigger to hit this we're not hitting this in CI
> > (oom is just way to unpredictable to even try). I'd kinda like to also add
> > some debug interface so I can provoke an oom kill of a specially prepared
> > process, to make sure we can reliably exercise this path without killing
> > the kernel accidentally. We do similar tricks for our shrinker already.
>
> Create a task with oom_score_adj = 1000 and trigger the oom killer via
> sysrq and you should get a predictable oom invocation and execution.

Ah right. We kinda do that already in an attempt to get the tests
killed without the runner, for accidental oom. Just didn't think about
this in the context of intentionally firing the oom. I'll try whether
I can bake up some new subtest in our userptr/mmu-notifier testcases.

> [...]
> > Wrt the behavior difference: I guess we could put another counter into the
> > task struct, and change might_sleep() to check it. All under
> > CONFIG_DEBUG_ATOMIC_SLEEP only ofc. That would avoid the preempt-disable
> > sideeffect. My worry with that is that people will spot it, and abuse it
> > in creative ways that do affect semantics. See horrors like
> > drm_can_sleep() (and I'm sure gfx folks are not the only ones who
> > seriously lacked taste here).
> >
> > Up to the experts really how to best paint this shed I think.
>
> Actually I like a way to say non_block_{begin,end} and might_sleep
> firing inside that context.

Ok, I'll respin with these (introduced in a separate patch).
-Daniel

Tvrtko Ursulin Nov. 23, 2018, 1:23 p.m. UTC | #8

On 23/11/2018 13:12, Daniel Vetter wrote:
> On Fri, Nov 23, 2018 at 1:46 PM Michal Hocko <mhocko@kernel.org> wrote:
>>
>> On Fri 23-11-18 13:38:38, Daniel Vetter wrote:
>>> On Fri, Nov 23, 2018 at 12:12:37PM +0100, Michal Hocko wrote:
>>>> On Thu 22-11-18 17:51:05, Daniel Vetter wrote:
>>>>> We need to make sure implementations don't cheat and don't have a
>>>>> possible schedule/blocking point deeply burried where review can't
>>>>> catch it.
>>>>>
>>>>> I'm not sure whether this is the best way to make sure all the
>>>>> might_sleep() callsites trigger, and it's a bit ugly in the code flow.
>>>>> But it gets the job done.
>>>>
>>>> Yeah, it is quite ugly. Especially because it makes DEBUG config
>>>> bahavior much different. So is this really worth it? Has this already
>>>> discovered any existing bug?
>>>
>>> Given that we need an oom trigger to hit this we're not hitting this in CI
>>> (oom is just way to unpredictable to even try). I'd kinda like to also add
>>> some debug interface so I can provoke an oom kill of a specially prepared
>>> process, to make sure we can reliably exercise this path without killing
>>> the kernel accidentally. We do similar tricks for our shrinker already.
>>
>> Create a task with oom_score_adj = 1000 and trigger the oom killer via
>> sysrq and you should get a predictable oom invocation and execution.
> 
> Ah right. We kinda do that already in an attempt to get the tests
> killed without the runner, for accidental oom. Just didn't think about
> this in the context of intentionally firing the oom. I'll try whether
> I can bake up some new subtest in our userptr/mmu-notifier testcases.

Very handy trick - I think I will think of applying it in the shrinker 
area as well.

Regards,

Tvrtko

[2/3] mm, notifier: Catch sleeping/blocking for !blockable

Commit Message

Comments

Patch