diff mbox series

[3/4] kernel.h: Add non_block_start/end()

Message ID 20190820081902.24815-4-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show
Series mmu notifier debug annotations/checks | expand

Commit Message

Daniel Vetter Aug. 20, 2019, 8:19 a.m. UTC
In some special cases we must not block, but there's not a
spinlock, preempt-off, irqs-off or similar critical section already
that arms the might_sleep() debug checks. Add a non_block_start/end()
pair to annotate these.

This will be used in the oom paths of mmu-notifiers, where blocking is
not allowed to make sure there's forward progress. Quoting Michal:

"The notifier is called from quite a restricted context - oom_reaper -
which shouldn't depend on any locks or sleepable conditionals. The code
should be swift as well but we mostly do care about it to make a forward
progress. Checking for sleepable context is the best thing we could come
up with that would describe these demands at least partially."

Peter also asked whether we want to catch spinlocks on top, but Michal
said those are less of a problem because spinlocks can't have an
indirect dependency upon the page allocator and hence close the loop
with the oom reaper.

Suggested by Michal Hocko.

v2:
- Improve commit message (Michal)
- Also check in schedule, not just might_sleep (Peter)

v3: It works better when I actually squash in the fixup I had lying
around :-/

v4: Pick the suggestion from Andrew Morton to give non_block_start/end
some good kerneldoc comments. I added that other blocking calls like
wait_event pose similar issues, since that's the other example we
discussed.

Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: linux-mm@kvack.org
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Wei Wang <wvw@google.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jann Horn <jannh@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-kernel@vger.kernel.org
Acked-by: Christian König <christian.koenig@amd.com> (v1)
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 include/linux/kernel.h | 25 ++++++++++++++++++++++++-
 include/linux/sched.h  |  4 ++++
 kernel/sched/core.c    | 19 ++++++++++++++-----
 3 files changed, 42 insertions(+), 6 deletions(-)

Comments

Daniel Vetter Aug. 20, 2019, 8:24 p.m. UTC | #1
On Tue, Aug 20, 2019 at 10:19:01AM +0200, Daniel Vetter wrote:
> In some special cases we must not block, but there's not a
> spinlock, preempt-off, irqs-off or similar critical section already
> that arms the might_sleep() debug checks. Add a non_block_start/end()
> pair to annotate these.
> 
> This will be used in the oom paths of mmu-notifiers, where blocking is
> not allowed to make sure there's forward progress. Quoting Michal:
> 
> "The notifier is called from quite a restricted context - oom_reaper -
> which shouldn't depend on any locks or sleepable conditionals. The code
> should be swift as well but we mostly do care about it to make a forward
> progress. Checking for sleepable context is the best thing we could come
> up with that would describe these demands at least partially."
> 
> Peter also asked whether we want to catch spinlocks on top, but Michal
> said those are less of a problem because spinlocks can't have an
> indirect dependency upon the page allocator and hence close the loop
> with the oom reaper.
> 
> Suggested by Michal Hocko.
> 
> v2:
> - Improve commit message (Michal)
> - Also check in schedule, not just might_sleep (Peter)
> 
> v3: It works better when I actually squash in the fixup I had lying
> around :-/
> 
> v4: Pick the suggestion from Andrew Morton to give non_block_start/end
> some good kerneldoc comments. I added that other blocking calls like
> wait_event pose similar issues, since that's the other example we
> discussed.
> 
> Cc: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: David Rientjes <rientjes@google.com>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: "Jérôme Glisse" <jglisse@redhat.com>
> Cc: linux-mm@kvack.org
> Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> Cc: Wei Wang <wvw@google.com>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Jann Horn <jannh@google.com>
> Cc: Feng Tang <feng.tang@intel.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: linux-kernel@vger.kernel.org
> Acked-by: Christian König <christian.koenig@amd.com> (v1)
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>

Hi Peter,

Iirc you've been involved at least somewhat in discussing this. -mm folks
are a bit undecided whether these new non_block semantics are a good idea.
Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe
are less enthusiastic. Jason said he's ok with merging the hmm side of
this if scheduler folks ack. If not, then I'll respin with the
preempt_disable/enable instead like in v1.

So ack/nack for this from the scheduler side?

Thanks, Daniel

> ---
>  include/linux/kernel.h | 25 ++++++++++++++++++++++++-
>  include/linux/sched.h  |  4 ++++
>  kernel/sched/core.c    | 19 ++++++++++++++-----
>  3 files changed, 42 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> index 4fa360a13c1e..82f84cfe372f 100644
> --- a/include/linux/kernel.h
> +++ b/include/linux/kernel.h
> @@ -217,7 +217,9 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
>   * might_sleep - annotation for functions that can sleep
>   *
>   * this macro will print a stack trace if it is executed in an atomic
> - * context (spinlock, irq-handler, ...).
> + * context (spinlock, irq-handler, ...). Additional sections where blocking is
> + * not allowed can be annotated with non_block_start() and non_block_end()
> + * pairs.
>   *
>   * This is a useful debugging help to be able to catch problems early and not
>   * be bitten later when the calling function happens to sleep when it is not
> @@ -233,6 +235,25 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
>  # define cant_sleep() \
>  	do { __cant_sleep(__FILE__, __LINE__, 0); } while (0)
>  # define sched_annotate_sleep()	(current->task_state_change = 0)
> +/**
> + * non_block_start - annotate the start of section where sleeping is prohibited
> + *
> + * This is on behalf of the oom reaper, specifically when it is calling the mmu
> + * notifiers. The problem is that if the notifier were to block on, for example,
> + * mutex_lock() and if the process which holds that mutex were to perform a
> + * sleeping memory allocation, the oom reaper is now blocked on completion of
> + * that memory allocation. Other blocking calls like wait_event() pose similar
> + * issues.
> + */
> +# define non_block_start() \
> +	do { current->non_block_count++; } while (0)
> +/**
> + * non_block_end - annotate the end of section where sleeping is prohibited
> + *
> + * Closes a section opened by non_block_start().
> + */
> +# define non_block_end() \
> +	do { WARN_ON(current->non_block_count-- == 0); } while (0)
>  #else
>    static inline void ___might_sleep(const char *file, int line,
>  				   int preempt_offset) { }
> @@ -241,6 +262,8 @@ extern void __cant_sleep(const char *file, int line, int preempt_offset);
>  # define might_sleep() do { might_resched(); } while (0)
>  # define cant_sleep() do { } while (0)
>  # define sched_annotate_sleep() do { } while (0)
> +# define non_block_start() do { } while (0)
> +# define non_block_end() do { } while (0)
>  #endif
>  
>  #define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 9f51932bd543..c5630f3dca1f 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -974,6 +974,10 @@ struct task_struct {
>  	struct mutex_waiter		*blocked_on;
>  #endif
>  
> +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> +	int				non_block_count;
> +#endif
> +
>  #ifdef CONFIG_TRACE_IRQFLAGS
>  	unsigned int			irq_events;
>  	unsigned long			hardirq_enable_ip;
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2b037f195473..57245770d6cc 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3700,13 +3700,22 @@ static noinline void __schedule_bug(struct task_struct *prev)
>  /*
>   * Various schedule()-time debugging checks and statistics:
>   */
> -static inline void schedule_debug(struct task_struct *prev)
> +static inline void schedule_debug(struct task_struct *prev, bool preempt)
>  {
>  #ifdef CONFIG_SCHED_STACK_END_CHECK
>  	if (task_stack_end_corrupted(prev))
>  		panic("corrupted stack end detected inside scheduler\n");
>  #endif
>  
> +#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> +	if (!preempt && prev->state && prev->non_block_count) {
> +		printk(KERN_ERR "BUG: scheduling in a non-blocking section: %s/%d/%i\n",
> +			prev->comm, prev->pid, prev->non_block_count);
> +		dump_stack();
> +		add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
> +	}
> +#endif
> +
>  	if (unlikely(in_atomic_preempt_off())) {
>  		__schedule_bug(prev);
>  		preempt_count_set(PREEMPT_DISABLED);
> @@ -3813,7 +3822,7 @@ static void __sched notrace __schedule(bool preempt)
>  	rq = cpu_rq(cpu);
>  	prev = rq->curr;
>  
> -	schedule_debug(prev);
> +	schedule_debug(prev, preempt);
>  
>  	if (sched_feat(HRTICK))
>  		hrtick_clear(rq);
> @@ -6570,7 +6579,7 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
>  	rcu_sleep_check();
>  
>  	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
> -	     !is_idle_task(current)) ||
> +	     !is_idle_task(current) && !current->non_block_count) ||
>  	    system_state == SYSTEM_BOOTING || system_state > SYSTEM_RUNNING ||
>  	    oops_in_progress)
>  		return;
> @@ -6586,8 +6595,8 @@ void ___might_sleep(const char *file, int line, int preempt_offset)
>  		"BUG: sleeping function called from invalid context at %s:%d\n",
>  			file, line);
>  	printk(KERN_ERR
> -		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
> -			in_atomic(), irqs_disabled(),
> +		"in_atomic(): %d, irqs_disabled(): %d, non_block: %d, pid: %d, name: %s\n",
> +			in_atomic(), irqs_disabled(), current->non_block_count,
>  			current->pid, current->comm);
>  
>  	if (task_stack_end_corrupted(current))
> -- 
> 2.23.0.rc1
>
Andrew Morton Aug. 22, 2019, 11:14 p.m. UTC | #2
On Tue, 20 Aug 2019 22:24:40 +0200 Daniel Vetter <daniel@ffwll.ch> wrote:

> Hi Peter,
> 
> Iirc you've been involved at least somewhat in discussing this. -mm folks
> are a bit undecided whether these new non_block semantics are a good idea.
> Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe
> are less enthusiastic. Jason said he's ok with merging the hmm side of
> this if scheduler folks ack. If not, then I'll respin with the
> preempt_disable/enable instead like in v1.

I became mollified once Michel explained the rationale.  I think it's
OK.  It's very specific to the oom reaper and hopefully won't be used
more widely(?).
Daniel Vetter Aug. 23, 2019, 8:34 a.m. UTC | #3
On Fri, Aug 23, 2019 at 1:14 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 20 Aug 2019 22:24:40 +0200 Daniel Vetter <daniel@ffwll.ch> wrote:
>
> > Hi Peter,
> >
> > Iirc you've been involved at least somewhat in discussing this. -mm folks
> > are a bit undecided whether these new non_block semantics are a good idea.
> > Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe
> > are less enthusiastic. Jason said he's ok with merging the hmm side of
> > this if scheduler folks ack. If not, then I'll respin with the
> > preempt_disable/enable instead like in v1.
>
> I became mollified once Michel explained the rationale.  I think it's
> OK.  It's very specific to the oom reaper and hopefully won't be used
> more widely(?).

Yeah, no plans for that from me. And I hope the comment above them now
explains why they exist, so people think twice before using it in
random places.
-Daniel
Peter Zijlstra Aug. 23, 2019, 8:48 a.m. UTC | #4
On Tue, Aug 20, 2019 at 10:24:40PM +0200, Daniel Vetter wrote:
> On Tue, Aug 20, 2019 at 10:19:01AM +0200, Daniel Vetter wrote:
> > In some special cases we must not block, but there's not a
> > spinlock, preempt-off, irqs-off or similar critical section already
> > that arms the might_sleep() debug checks. Add a non_block_start/end()
> > pair to annotate these.
> > 
> > This will be used in the oom paths of mmu-notifiers, where blocking is
> > not allowed to make sure there's forward progress. Quoting Michal:
> > 
> > "The notifier is called from quite a restricted context - oom_reaper -
> > which shouldn't depend on any locks or sleepable conditionals. The code
> > should be swift as well but we mostly do care about it to make a forward
> > progress. Checking for sleepable context is the best thing we could come
> > up with that would describe these demands at least partially."
> > 
> > Peter also asked whether we want to catch spinlocks on top, but Michal
> > said those are less of a problem because spinlocks can't have an
> > indirect dependency upon the page allocator and hence close the loop
> > with the oom reaper.
> > 
> > Suggested by Michal Hocko.
> > 
> > v2:
> > - Improve commit message (Michal)
> > - Also check in schedule, not just might_sleep (Peter)
> > 
> > v3: It works better when I actually squash in the fixup I had lying
> > around :-/
> > 
> > v4: Pick the suggestion from Andrew Morton to give non_block_start/end
> > some good kerneldoc comments. I added that other blocking calls like
> > wait_event pose similar issues, since that's the other example we
> > discussed.
> > 
> > Cc: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: "Christian König" <christian.koenig@amd.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > Cc: linux-mm@kvack.org
> > Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
> > Cc: Wei Wang <wvw@google.com>
> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Feng Tang <feng.tang@intel.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: linux-kernel@vger.kernel.org
> > Acked-by: Christian König <christian.koenig@amd.com> (v1)
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> 
> Hi Peter,
> 
> Iirc you've been involved at least somewhat in discussing this. -mm folks
> are a bit undecided whether these new non_block semantics are a good idea.
> Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe
> are less enthusiastic. Jason said he's ok with merging the hmm side of
> this if scheduler folks ack. If not, then I'll respin with the
> preempt_disable/enable instead like in v1.
> 
> So ack/nack for this from the scheduler side?

Right, I had memories of seeing this before, and I just found a fairly
long discussion on this elsewhere in the vacation inbox (*groan*).

Yeah, this is something I can live with,

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Jason Gunthorpe Aug. 23, 2019, 12:12 p.m. UTC | #5
On Fri, Aug 23, 2019 at 10:34:01AM +0200, Daniel Vetter wrote:
> On Fri, Aug 23, 2019 at 1:14 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Tue, 20 Aug 2019 22:24:40 +0200 Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > > Hi Peter,
> > >
> > > Iirc you've been involved at least somewhat in discussing this. -mm folks
> > > are a bit undecided whether these new non_block semantics are a good idea.
> > > Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe
> > > are less enthusiastic. Jason said he's ok with merging the hmm side of
> > > this if scheduler folks ack. If not, then I'll respin with the
> > > preempt_disable/enable instead like in v1.
> >
> > I became mollified once Michel explained the rationale.  I think it's
> > OK.  It's very specific to the oom reaper and hopefully won't be used
> > more widely(?).
> 
> Yeah, no plans for that from me. And I hope the comment above them now
> explains why they exist, so people think twice before using it in
> random places.

I still haven't heard a satisfactory answer why a whole new scheme is
needed and a simple:

   if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP))
        preempt_disable()

isn't sufficient to catch the problematic cases during debugging??
IMHO the fact preempt is changed by the above when debugging is not
material here. I think that information should be included in the
commit message at least.

But if sched people are happy then lets go ahead. Can you send a v2
with the check encompassing the invalidate_range_end?

Jason
Peter Zijlstra Aug. 23, 2019, 12:22 p.m. UTC | #6
On Fri, Aug 23, 2019 at 09:12:34AM -0300, Jason Gunthorpe wrote:

> I still haven't heard a satisfactory answer why a whole new scheme is
> needed and a simple:
> 
>    if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP))
>         preempt_disable()
> 
> isn't sufficient to catch the problematic cases during debugging??
> IMHO the fact preempt is changed by the above when debugging is not
> material here. I think that information should be included in the
> commit message at least.

That has a much larger impact and actually changes behaviour, while the
relatively simple patch Daniel proposed only adds a warning but doesn't
affect behaviour.
Daniel Vetter Aug. 23, 2019, 1:42 p.m. UTC | #7
On Fri, Aug 23, 2019 at 2:12 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Fri, Aug 23, 2019 at 10:34:01AM +0200, Daniel Vetter wrote:
> > On Fri, Aug 23, 2019 at 1:14 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> > >
> > > On Tue, 20 Aug 2019 22:24:40 +0200 Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > > Hi Peter,
> > > >
> > > > Iirc you've been involved at least somewhat in discussing this. -mm folks
> > > > are a bit undecided whether these new non_block semantics are a good idea.
> > > > Michal Hocko still is in support, but Andrew Morton and Jason Gunthorpe
> > > > are less enthusiastic. Jason said he's ok with merging the hmm side of
> > > > this if scheduler folks ack. If not, then I'll respin with the
> > > > preempt_disable/enable instead like in v1.
> > >
> > > I became mollified once Michel explained the rationale.  I think it's
> > > OK.  It's very specific to the oom reaper and hopefully won't be used
> > > more widely(?).
> >
> > Yeah, no plans for that from me. And I hope the comment above them now
> > explains why they exist, so people think twice before using it in
> > random places.
>
> I still haven't heard a satisfactory answer why a whole new scheme is
> needed and a simple:
>
>    if (IS_ENABLED(CONFIG_DEBUG_ATOMIC_SLEEP))
>         preempt_disable()
>
> isn't sufficient to catch the problematic cases during debugging??
> IMHO the fact preempt is changed by the above when debugging is not
> material here. I think that information should be included in the
> commit message at least.
>
> But if sched people are happy then lets go ahead. Can you send a v2
> with the check encompassing the invalidate_range_end?

Yes I will resend with this patch plus the next, amended as we
discussed, plus the might_sleep annotations. I'm assuming the lockdep
one will land, so not going to resend that.
-Daniel
Peter Zijlstra Aug. 23, 2019, 2:06 p.m. UTC | #8
On Fri, Aug 23, 2019 at 03:42:47PM +0200, Daniel Vetter wrote:
> I'm assuming the lockdep one will land, so not going to resend that.

I was assuming you'd wake the might_lock_nested() along with the i915
user through the i915/drm tree. If want me to take some or all of that,
lemme know.
Daniel Vetter Aug. 23, 2019, 3:15 p.m. UTC | #9
On Fri, Aug 23, 2019 at 4:06 PM Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Aug 23, 2019 at 03:42:47PM +0200, Daniel Vetter wrote:
> > I'm assuming the lockdep one will land, so not going to resend that.
>
> I was assuming you'd wake the might_lock_nested() along with the i915
> user through the i915/drm tree. If want me to take some or all of that,
> lemme know.

might_lock_nested() is a different patch series, that one will indeed
go in through the drm/i915 tree, thx for the ack there. What I meant
here is some mmu notifier lockdep map in this series that Jason said
he's going to pick up into hmm.git. I'm doing about 3 or 4 different
lockdep annotations series in parallel right now :-)
-Daniel
diff mbox series

Patch

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 4fa360a13c1e..82f84cfe372f 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -217,7 +217,9 @@  extern void __cant_sleep(const char *file, int line, int preempt_offset);
  * might_sleep - annotation for functions that can sleep
  *
  * this macro will print a stack trace if it is executed in an atomic
- * context (spinlock, irq-handler, ...).
+ * context (spinlock, irq-handler, ...). Additional sections where blocking is
+ * not allowed can be annotated with non_block_start() and non_block_end()
+ * pairs.
  *
  * This is a useful debugging help to be able to catch problems early and not
  * be bitten later when the calling function happens to sleep when it is not
@@ -233,6 +235,25 @@  extern void __cant_sleep(const char *file, int line, int preempt_offset);
 # define cant_sleep() \
 	do { __cant_sleep(__FILE__, __LINE__, 0); } while (0)
 # define sched_annotate_sleep()	(current->task_state_change = 0)
+/**
+ * non_block_start - annotate the start of section where sleeping is prohibited
+ *
+ * This is on behalf of the oom reaper, specifically when it is calling the mmu
+ * notifiers. The problem is that if the notifier were to block on, for example,
+ * mutex_lock() and if the process which holds that mutex were to perform a
+ * sleeping memory allocation, the oom reaper is now blocked on completion of
+ * that memory allocation. Other blocking calls like wait_event() pose similar
+ * issues.
+ */
+# define non_block_start() \
+	do { current->non_block_count++; } while (0)
+/**
+ * non_block_end - annotate the end of section where sleeping is prohibited
+ *
+ * Closes a section opened by non_block_start().
+ */
+# define non_block_end() \
+	do { WARN_ON(current->non_block_count-- == 0); } while (0)
 #else
   static inline void ___might_sleep(const char *file, int line,
 				   int preempt_offset) { }
@@ -241,6 +262,8 @@  extern void __cant_sleep(const char *file, int line, int preempt_offset);
 # define might_sleep() do { might_resched(); } while (0)
 # define cant_sleep() do { } while (0)
 # define sched_annotate_sleep() do { } while (0)
+# define non_block_start() do { } while (0)
+# define non_block_end() do { } while (0)
 #endif
 
 #define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 9f51932bd543..c5630f3dca1f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -974,6 +974,10 @@  struct task_struct {
 	struct mutex_waiter		*blocked_on;
 #endif
 
+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+	int				non_block_count;
+#endif
+
 #ifdef CONFIG_TRACE_IRQFLAGS
 	unsigned int			irq_events;
 	unsigned long			hardirq_enable_ip;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2b037f195473..57245770d6cc 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3700,13 +3700,22 @@  static noinline void __schedule_bug(struct task_struct *prev)
 /*
  * Various schedule()-time debugging checks and statistics:
  */
-static inline void schedule_debug(struct task_struct *prev)
+static inline void schedule_debug(struct task_struct *prev, bool preempt)
 {
 #ifdef CONFIG_SCHED_STACK_END_CHECK
 	if (task_stack_end_corrupted(prev))
 		panic("corrupted stack end detected inside scheduler\n");
 #endif
 
+#ifdef CONFIG_DEBUG_ATOMIC_SLEEP
+	if (!preempt && prev->state && prev->non_block_count) {
+		printk(KERN_ERR "BUG: scheduling in a non-blocking section: %s/%d/%i\n",
+			prev->comm, prev->pid, prev->non_block_count);
+		dump_stack();
+		add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
+	}
+#endif
+
 	if (unlikely(in_atomic_preempt_off())) {
 		__schedule_bug(prev);
 		preempt_count_set(PREEMPT_DISABLED);
@@ -3813,7 +3822,7 @@  static void __sched notrace __schedule(bool preempt)
 	rq = cpu_rq(cpu);
 	prev = rq->curr;
 
-	schedule_debug(prev);
+	schedule_debug(prev, preempt);
 
 	if (sched_feat(HRTICK))
 		hrtick_clear(rq);
@@ -6570,7 +6579,7 @@  void ___might_sleep(const char *file, int line, int preempt_offset)
 	rcu_sleep_check();
 
 	if ((preempt_count_equals(preempt_offset) && !irqs_disabled() &&
-	     !is_idle_task(current)) ||
+	     !is_idle_task(current) && !current->non_block_count) ||
 	    system_state == SYSTEM_BOOTING || system_state > SYSTEM_RUNNING ||
 	    oops_in_progress)
 		return;
@@ -6586,8 +6595,8 @@  void ___might_sleep(const char *file, int line, int preempt_offset)
 		"BUG: sleeping function called from invalid context at %s:%d\n",
 			file, line);
 	printk(KERN_ERR
-		"in_atomic(): %d, irqs_disabled(): %d, pid: %d, name: %s\n",
-			in_atomic(), irqs_disabled(),
+		"in_atomic(): %d, irqs_disabled(): %d, non_block: %d, pid: %d, name: %s\n",
+			in_atomic(), irqs_disabled(), current->non_block_count,
 			current->pid, current->comm);
 
 	if (task_stack_end_corrupted(current))