Message ID | 20200715183537.4010-1-urezki@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/1] rcu/tree: Drop the lock before entering to page allocator | expand |
On 2020-07-15 20:35:37 [+0200], Uladzislau Rezki (Sony) wrote: > @@ -3306,6 +3307,9 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > return false; > > + preempt_disable(); > + krc_this_cpu_unlock(*krcp, *flags); Now you enter memory allocator with disabled preemption. This isn't any better but we don't have a warning for this yet. What happened to the part where I asked for a spinlock_t? > + > /* > * NOTE: For one argument of kvfree_rcu() we can > * drop the lock and get the page in sleepable Sebastian
On Wed, Jul 15, 2020 at 08:56:28PM +0200, Sebastian Andrzej Siewior wrote: > On 2020-07-15 20:35:37 [+0200], Uladzislau Rezki (Sony) wrote: > > @@ -3306,6 +3307,9 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > > return false; > > > > + preempt_disable(); > > + krc_this_cpu_unlock(*krcp, *flags); > > Now you enter memory allocator with disabled preemption. This isn't any > better but we don't have a warning for this yet. > Please elaborate why it is not allowed? Consider the code: <snip> spin_lock(); __get_free_page(GFP_NOWAIT | __GFP_NOWARN); spin_unlock(); <snip> Also, please note we do it for regular kernel. > > What happened to the part where I asked for a spinlock_t? > What do you mean? -- Vlad Rezki
On 2020-07-15 21:02:43 [+0200], Uladzislau Rezki wrote: > > <snip> > spin_lock(); > __get_free_page(GFP_NOWAIT | __GFP_NOWARN); > spin_unlock(); > <snip> > > Also, please note we do it for regular kernel. ach right okay then. > > > > What happened to the part where I asked for a spinlock_t? > > > What do you mean? Please drop that raw_spinlock_t for the kfree_rcu() based locking and use just a plain spinlock_t for the locking. Then you can keep the same code flow for RT and !RT without any special cases and everything. > -- > Vlad Rezki Sebastian
On Wed, Jul 15, 2020 at 09:32:50PM +0200, Sebastian Andrzej Siewior wrote: > On 2020-07-15 21:02:43 [+0200], Uladzislau Rezki wrote: > > > > <snip> > > spin_lock(); > > __get_free_page(GFP_NOWAIT | __GFP_NOWARN); > > spin_unlock(); > > <snip> > > > > Also, please note we do it for regular kernel. > > ach right okay then. > > > > > > > What happened to the part where I asked for a spinlock_t? > > > > > What do you mean? > > Please drop that raw_spinlock_t for the kfree_rcu() based locking and > use just a plain spinlock_t for the locking. Then you can keep the same > code flow for RT and !RT without any special cases and everything. > Ahhh... That i see :) I guess it is up to Paul. I wrote my thoughts. I have only one concern that i explained in my previous mails. Thank you, Sebastian, for your comments! -- Vlad Rezki
On Wed, Jul 15, 2020 at 09:32:50PM +0200, Sebastian Andrzej Siewior wrote: > On 2020-07-15 21:02:43 [+0200], Uladzislau Rezki wrote: > > > > <snip> > > spin_lock(); > > __get_free_page(GFP_NOWAIT | __GFP_NOWARN); > > spin_unlock(); > > <snip> > > > > Also, please note we do it for regular kernel. > > ach right okay then. > > > > > > > What happened to the part where I asked for a spinlock_t? > > > > > What do you mean? > > Please drop that raw_spinlock_t for the kfree_rcu() based locking and > use just a plain spinlock_t for the locking. Then you can keep the same > code flow for RT and !RT without any special cases and everything. My concern is that some critical bug will show up at some point that requires double-argument kfree_rcu() be invoked while holding a raw spinlock. (Single-argument kfree_rcu() must sometimes invoke synchronize_rcu(), so it can never be invoked in any state forbidding invoking schedule().) Yes, dropping to a plain spinlock would be simple in the here and now, but experience indicates that it is only a matter of time, and that when that time comes it will come as an emergency. One approach would be to replace the "IS_ENABLED(CONFIG_PREEMPT_RT)" with some sort of check for being in a context where spinlock acquisition is not legal. What could be done along those lines? Thanx, Paul
On Wed, Jul 15, 2020 at 2:56 PM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > On 2020-07-15 20:35:37 [+0200], Uladzislau Rezki (Sony) wrote: > > @@ -3306,6 +3307,9 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > > return false; > > > > + preempt_disable(); > > + krc_this_cpu_unlock(*krcp, *flags); > > Now you enter memory allocator with disabled preemption. This isn't any > better but we don't have a warning for this yet. > What happened to the part where I asked for a spinlock_t? Ulad, Wouldn't the replacing of preempt_disable() with migrate_disable() above resolve Sebastian's issue? Or which scenario breaks? thanks, - Joel > > > + > > /* > > * NOTE: For one argument of kvfree_rcu() we can > > * drop the lock and get the page in sleepable > > Sebastian
On Wed, Jul 15, 2020 at 07:13:33PM -0400, Joel Fernandes wrote: > On Wed, Jul 15, 2020 at 2:56 PM Sebastian Andrzej Siewior > <bigeasy@linutronix.de> wrote: > > > > On 2020-07-15 20:35:37 [+0200], Uladzislau Rezki (Sony) wrote: > > > @@ -3306,6 +3307,9 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) > > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > > > return false; > > > > > > + preempt_disable(); > > > + krc_this_cpu_unlock(*krcp, *flags); > > > > Now you enter memory allocator with disabled preemption. This isn't any > > better but we don't have a warning for this yet. > > What happened to the part where I asked for a spinlock_t? > > Ulad, > Wouldn't the replacing of preempt_disable() with migrate_disable() > above resolve Sebastian's issue? > This for regular kernel only. That means that migrate_disable() is equal to preempt_disable(). So, no difference. > > Or which scenario breaks? > Proposed patch fixes Sebastian's finding about CONFIG_PROVE_RAW_LOCK_NESTING kernel option, that checks nesting rules and forbids raw_spinlock versus spinlock mixing. Sebastian, could you please confirm that if that patch that is in question fixes it? It would be appreciated! -- Vlad Rezki
On Thu, Jul 16, 2020 at 11:19:13AM +0200, Uladzislau Rezki wrote: > On Wed, Jul 15, 2020 at 07:13:33PM -0400, Joel Fernandes wrote: > > On Wed, Jul 15, 2020 at 2:56 PM Sebastian Andrzej Siewior > > <bigeasy@linutronix.de> wrote: > > > > > > On 2020-07-15 20:35:37 [+0200], Uladzislau Rezki (Sony) wrote: > > > > @@ -3306,6 +3307,9 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) > > > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > > > > return false; > > > > > > > > + preempt_disable(); > > > > + krc_this_cpu_unlock(*krcp, *flags); > > > > > > Now you enter memory allocator with disabled preemption. This isn't any > > > better but we don't have a warning for this yet. > > > What happened to the part where I asked for a spinlock_t? > > > > Ulad, > > Wouldn't the replacing of preempt_disable() with migrate_disable() > > above resolve Sebastian's issue? > > > This for regular kernel only. That means that migrate_disable() is > equal to preempt_disable(). So, no difference. But this will force preempt_disable() context into the low-level page allocator on -RT kernels which I believe is not what Sebastian wants. The whole reason why the spinlock vs raw-spinlock ordering matters is, because on RT, the spinlock is sleeping. So if you have: raw_spin_lock(..); spin_lock(..); <-- can sleep on RT, so Sleep while atomic (SWA) violation. That's the main reason you are dropping the lock before calling the allocator. But if you call preempt_disable() and drop the lock, then that may fix the lock ordering violation only to reintroduce the SWA issue, which is why you wanted to drop the lock in the first place. migrate_disable() on -RT kernels will not disable preemption which is where Sebastian is coming from I think: https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-stable-rt.git/tree/kernel/sched/core.c?h=v5.4-rt#n8178 In your defense, the "don't disable preemption on migrate_disable()" on -RT part is not upstream yet. So maybe this will not work on upstream PREEMPT_RT, but I'm not sure whether folks are really running upstream PREEMPT_RT without going through the RT tree. I did not see the drawback of just keeping it as migrate_disable() which should make everyone happy but Sebastian may have other opinions such as maybe that converting the lock to normal spinlock makes the code work on upstream's version of PREEMPT_RT without requiring the migrate_disable() change. But he can elaborate more. Paul, how does all this sound to you? thanks, - Joel
On 2020-07-15 15:14:49 [-0700], Paul E. McKenney wrote: > > My concern is that some critical bug will show up at some point > that requires double-argument kfree_rcu() be invoked while holding > a raw spinlock. (Single-argument kfree_rcu() must sometimes invoke > synchronize_rcu(), so it can never be invoked in any state forbidding > invoking schedule().) So you are saying as of today we are good but in near future the following synchronize_rcu() -> kfree_rcu() may be needed? > Yes, dropping to a plain spinlock would be simple in the here and now, > but experience indicates that it is only a matter of time, and that when > that time comes it will come as an emergency. Hmmm. > One approach would be to replace the "IS_ENABLED(CONFIG_PREEMPT_RT)" > with some sort of check for being in a context where spinlock acquisition > is not legal. What could be done along those lines? I would rethink the whole concept how this is implemented now and give it another try. The code does not look pretty and is looking complicated. The RT covering of this part then just added a simple return because nothing else seemed to be possible. This patch here looks like another duct tape attempt to avoid a warning. > Thanx, Paul Sebastian
On 2020-07-16 11:19:13 [+0200], Uladzislau Rezki wrote: > Sebastian, could you please confirm that if that patch that is in > question fixes it? > > It would be appreciated! So that preempt disable should in terms any warnings. However I don't think that it is strictly needed and from scheduling point of view you forbid a CPU migration which might be good otherwise. Also if interrupts and everything is enabled then someone else might invoke kfree_rcu() from BH context for instance. Sebastian
On Thu, Jul 16, 2020 at 09:36:47AM -0400, Joel Fernandes wrote: > On Thu, Jul 16, 2020 at 11:19:13AM +0200, Uladzislau Rezki wrote: > > On Wed, Jul 15, 2020 at 07:13:33PM -0400, Joel Fernandes wrote: > > > On Wed, Jul 15, 2020 at 2:56 PM Sebastian Andrzej Siewior > > > <bigeasy@linutronix.de> wrote: > > > > > > > > On 2020-07-15 20:35:37 [+0200], Uladzislau Rezki (Sony) wrote: > > > > > @@ -3306,6 +3307,9 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) > > > > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > > > > > return false; > > > > > > > > > > + preempt_disable(); > > > > > + krc_this_cpu_unlock(*krcp, *flags); > > > > > > > > Now you enter memory allocator with disabled preemption. This isn't any > > > > better but we don't have a warning for this yet. > > > > What happened to the part where I asked for a spinlock_t? > > > > > > Ulad, > > > Wouldn't the replacing of preempt_disable() with migrate_disable() > > > above resolve Sebastian's issue? > > > > > This for regular kernel only. That means that migrate_disable() is > > equal to preempt_disable(). So, no difference. > > But this will force preempt_disable() context into the low-level page > allocator on -RT kernels which I believe is not what Sebastian wants. The > whole reason why the spinlock vs raw-spinlock ordering matters is, because on > RT, the spinlock is sleeping. So if you have: > > raw_spin_lock(..); > spin_lock(..); <-- can sleep on RT, so Sleep while atomic (SWA) violation. > > That's the main reason you are dropping the lock before calling the > allocator. > No. Please read the commit message of this patch. This is for regular kernel. You did a patch: <snip> if (IS_ENABLED(CONFIG_PREEMPT_RT)) return false; <snip> -- Vlad Rezki
On Thu, Jul 16, 2020 at 04:25:37PM +0200, Sebastian Andrzej Siewior wrote: > On 2020-07-16 11:19:13 [+0200], Uladzislau Rezki wrote: > > Sebastian, could you please confirm that if that patch that is in > > question fixes it? > > > > It would be appreciated! > > So that preempt disable should in terms any warnings. However I don't > think that it is strictly needed and from scheduling point of view you > forbid a CPU migration which might be good otherwise. > Please elaborate your point regarding "i do not think it is strictly needed". Actually i can rework the patch to remove even such preempt_enable/disable to stay on the same CPU, but i do not see the point of doing it. Do you see the point? As for scheduling point of view. Well, there are many places when there is a demand in memory or pages from atomic context. Also, getting a page is not considered as a hot path in the kfree_rcu(). > > Also if interrupts and everything is enabled then someone else might > invoke kfree_rcu() from BH context for instance. > And what? What is a problem here, please elaborate if you see any issues. -- Vlad Rezki
On 2020-07-16 16:47:28 [+0200], Uladzislau Rezki wrote: > On Thu, Jul 16, 2020 at 04:25:37PM +0200, Sebastian Andrzej Siewior wrote: > > On 2020-07-16 11:19:13 [+0200], Uladzislau Rezki wrote: > > > Sebastian, could you please confirm that if that patch that is in > > > question fixes it? > > > > > > It would be appreciated! > > > > So that preempt disable should in terms any warnings. However I don't > > think that it is strictly needed and from scheduling point of view you > > forbid a CPU migration which might be good otherwise. > > > Please elaborate your point regarding "i do not think it is strictly needed". > > Actually i can rework the patch to remove even such preempt_enable/disable > to stay on the same CPU, but i do not see the point of doing it. > > Do you see the point? You disable preemption for what reason? It is not documented, it is not obvious - why is it required? > As for scheduling point of view. Well, there are many places when there > is a demand in memory or pages from atomic context. Also, getting a page > is not considered as a hot path in the kfree_rcu(). If you disable preemption than you assume that you wouldn't be atomic otherwise. You say that at this point it is not a hot path so if this is not *that* important why not allow preemption and allow the schedule to place you somewhere else if the scheduler decides that it is a good idea. > > Also if interrupts and everything is enabled then someone else might > > invoke kfree_rcu() from BH context for instance. > > > And what? What is a problem here, please elaborate if you see any > issues. That the kfree_rcu() caller from BH context will end up here as well, asking for a page. Sebastian
On Thu, Jul 16, 2020 at 04:14:21PM +0200, Sebastian Andrzej Siewior wrote: > On 2020-07-15 15:14:49 [-0700], Paul E. McKenney wrote: > > > > My concern is that some critical bug will show up at some point > > that requires double-argument kfree_rcu() be invoked while holding > > a raw spinlock. (Single-argument kfree_rcu() must sometimes invoke > > synchronize_rcu(), so it can never be invoked in any state forbidding > > invoking schedule().) > > So you are saying as of today we are good but in near future the > following > synchronize_rcu() -> kfree_rcu() > > may be needed? You lost me on this one. I am instead concerned that something like this might be needed on short notice: raw_spin_lock(&some_lock); kfree_rcu(some_pointer, some_field_offset); In contrast, single-argument kfree_rcu() cannot be invoked from any environment where synchronize_rcu() cannot be invoked. > > Yes, dropping to a plain spinlock would be simple in the here and now, > > but experience indicates that it is only a matter of time, and that when > > that time comes it will come as an emergency. > > Hmmm. I point out the call_rcu() experience. > > One approach would be to replace the "IS_ENABLED(CONFIG_PREEMPT_RT)" > > with some sort of check for being in a context where spinlock acquisition > > is not legal. What could be done along those lines? > > I would rethink the whole concept how this is implemented now and give > it another try. The code does not look pretty and is looking > complicated. The RT covering of this part then just added a simple > return because nothing else seemed to be possible. This patch here > looks like another duct tape attempt to avoid a warning. In addition to the possibility of invocation from BH? Thanx, Paul
On Thu, Jul 16, 2020 at 05:04:14PM +0200, Sebastian Andrzej Siewior wrote: > On 2020-07-16 16:47:28 [+0200], Uladzislau Rezki wrote: > > On Thu, Jul 16, 2020 at 04:25:37PM +0200, Sebastian Andrzej Siewior wrote: > > > On 2020-07-16 11:19:13 [+0200], Uladzislau Rezki wrote: > > > > Sebastian, could you please confirm that if that patch that is in > > > > question fixes it? > > > > > > > > It would be appreciated! > > > > > > So that preempt disable should in terms any warnings. However I don't > > > think that it is strictly needed and from scheduling point of view you > > > forbid a CPU migration which might be good otherwise. > > > > > Please elaborate your point regarding "i do not think it is strictly needed". > > > > Actually i can rework the patch to remove even such preempt_enable/disable > > to stay on the same CPU, but i do not see the point of doing it. > > > > Do you see the point? > > You disable preemption for what reason? It is not documented, it is not > obvious - why is it required? > I can document it. Will it work for you? Actually i can get rid of it but there can be other side effects which also can be addressed but i do not see any issues of doing just "preemtion off". Please have a look at sources across the kernel how many times a memory is requested in atomic context: <snip> preempt_disable() os any spinlock or raw_locks, etc.. __get_free_page(GFP_NOWAIT | __GFP_NOWARN); or kmalloc(GFP_ATOMIC); <snip> all those flags say to page allocator or SLAB that sleeping is not allowed. > > As for scheduling point of view. Well, there are many places when there > > is a demand in memory or pages from atomic context. Also, getting a page > > is not considered as a hot path in the kfree_rcu(). > > If you disable preemption than you assume that you wouldn't be atomic > otherwise. You say that at this point it is not a hot path so if this is > not *that* important why not allow preemption and allow the schedule to > If i disable preemption, it means that atomic section begins. Let me explain why i disable preemption. If i invoke a page allocator in full preemptable context, it can be that we get a page but end up on another CPU. That another CPU does not need it, because it has some free spots in its internal array for pointer collecting. If we stay on the same CPU we eliminate it. The question is what to do with that page. I see two solutions. 1) Queue it to the CPU2 page cache for further reuse or freeing. 2) Just proceed with newly allocated page thinking that previous "bulk arry" is partly populated, i.e. it can be that previous one has plenty of free spots. Actually that is why i want to stay on the same CPU. > > place you somewhere else if the scheduler decides that it is a good idea. > It is not a hot path, really. I do not consider it as critical, since the page allocator will not do much work(we use restricted flags), on a high level it is limited to just ask a page and return it. If no page, check watermark, if low, wakeup kswapd and return NULL, that is it. > > > Also if interrupts and everything is enabled then someone else might > > > invoke kfree_rcu() from BH context for instance. > > > > > And what? What is a problem here, please elaborate if you see any > > issues. > > That the kfree_rcu() caller from BH context will end up here as well, > asking for a page. > Please think about that CPU0 is somewhere in __get_free_page(), when it is still there, there comes an interrupt that also calls kfree_rcu() and end up somewhere in __get_free_page(). To prevent such internal critical sections usually the code disables irqs and do some critical things to prevent of breaking something. So, basically __get_free_page() can be interrupted and being invoked one more time on the same CPU. It uses spin_lockirqsave() for such scenarios. Our internal lock is dropped. -- Vlad Rezki
On 2020-07-16 08:20:27 [-0700], Paul E. McKenney wrote: > You lost me on this one. I am instead concerned that something like this > might be needed on short notice: > > raw_spin_lock(&some_lock); > kfree_rcu(some_pointer, some_field_offset); > > In contrast, single-argument kfree_rcu() cannot be invoked from any > environment where synchronize_rcu() cannot be invoked. I see. We don't have any kfree() in that context as far as I remember. We had a few cases in "resize" where you allocate memory, copy content and free old memory while under the lock but they are gone. > > > Yes, dropping to a plain spinlock would be simple in the here and now, > > > but experience indicates that it is only a matter of time, and that when > > > that time comes it will come as an emergency. > > > > Hmmm. > > I point out the call_rcu() experience. > > > > One approach would be to replace the "IS_ENABLED(CONFIG_PREEMPT_RT)" > > > with some sort of check for being in a context where spinlock acquisition > > > is not legal. What could be done along those lines? > > > > I would rethink the whole concept how this is implemented now and give > > it another try. The code does not look pretty and is looking > > complicated. The RT covering of this part then just added a simple > > return because nothing else seemed to be possible. This patch here > > looks like another duct tape attempt to avoid a warning. > > In addition to the possibility of invocation from BH? Invocation from BH should be possible because network would probably be the first user. I don't remember anything wrong with BH if I remember correctly. > Thanx, Paul Sebastian
On Thu, Jul 16, 2020 at 05:36:38PM +0200, Sebastian Andrzej Siewior wrote: > On 2020-07-16 08:20:27 [-0700], Paul E. McKenney wrote: > > You lost me on this one. I am instead concerned that something like this > > might be needed on short notice: > > > > raw_spin_lock(&some_lock); > > kfree_rcu(some_pointer, some_field_offset); > > > > In contrast, single-argument kfree_rcu() cannot be invoked from any > > environment where synchronize_rcu() cannot be invoked. > > I see. We don't have any kfree() in that context as far as I remember. > We had a few cases in "resize" where you allocate memory, copy content > and free old memory while under the lock but they are gone. True, but we also didn't have any calls to call_rcu() prior to the call to rcu_init() until suddenly we did. (Yeah, I could have put my foot down and prohibited that practice, but the workarounds were quite a bit more complicated than just making call_rcu() work during very early boot.) And last I checked, there really were calls to call_rcu() under raw spinlocks, so the potential or calls to double-argument kfree_rcu() clearly exists and is very real. > > > > Yes, dropping to a plain spinlock would be simple in the here and now, > > > > but experience indicates that it is only a matter of time, and that when > > > > that time comes it will come as an emergency. > > > > > > Hmmm. > > > > I point out the call_rcu() experience. > > > > > > One approach would be to replace the "IS_ENABLED(CONFIG_PREEMPT_RT)" > > > > with some sort of check for being in a context where spinlock acquisition > > > > is not legal. What could be done along those lines? > > > > > > I would rethink the whole concept how this is implemented now and give > > > it another try. The code does not look pretty and is looking > > > complicated. The RT covering of this part then just added a simple > > > return because nothing else seemed to be possible. This patch here > > > looks like another duct tape attempt to avoid a warning. > > > > In addition to the possibility of invocation from BH? > > Invocation from BH should be possible because network would probably be > the first user. I don't remember anything wrong with BH if I remember > correctly. OK, that is reassuring. Here is hoping! Thanx, Paul
On Thu, Jul 16, 2020 at 04:37:14PM +0200, Uladzislau Rezki wrote: > On Thu, Jul 16, 2020 at 09:36:47AM -0400, Joel Fernandes wrote: > > On Thu, Jul 16, 2020 at 11:19:13AM +0200, Uladzislau Rezki wrote: > > > On Wed, Jul 15, 2020 at 07:13:33PM -0400, Joel Fernandes wrote: > > > > On Wed, Jul 15, 2020 at 2:56 PM Sebastian Andrzej Siewior > > > > <bigeasy@linutronix.de> wrote: > > > > > > > > > > On 2020-07-15 20:35:37 [+0200], Uladzislau Rezki (Sony) wrote: > > > > > > @@ -3306,6 +3307,9 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) > > > > > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > > > > > > return false; > > > > > > > > > > > > + preempt_disable(); > > > > > > + krc_this_cpu_unlock(*krcp, *flags); > > > > > > > > > > Now you enter memory allocator with disabled preemption. This isn't any > > > > > better but we don't have a warning for this yet. > > > > > What happened to the part where I asked for a spinlock_t? > > > > > > > > Ulad, > > > > Wouldn't the replacing of preempt_disable() with migrate_disable() > > > > above resolve Sebastian's issue? > > > > > > > This for regular kernel only. That means that migrate_disable() is > > > equal to preempt_disable(). So, no difference. > > > > But this will force preempt_disable() context into the low-level page > > allocator on -RT kernels which I believe is not what Sebastian wants. The > > whole reason why the spinlock vs raw-spinlock ordering matters is, because on > > RT, the spinlock is sleeping. So if you have: > > > > raw_spin_lock(..); > > spin_lock(..); <-- can sleep on RT, so Sleep while atomic (SWA) violation. > > > > That's the main reason you are dropping the lock before calling the > > allocator. > > > No. Please read the commit message of this patch. This is for regular kernel. Wait, so what is the hesitation to put migrate_disable() here? It is even further documentation (annotation) that the goal here is to stay on the same CPU - as you indicated in later emails. And the documentation aspect is also something Sebastian brought. A plain preempt_disable() is frowned up if there are alternative API that document the usage. > You did a patch: > > <snip> > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > return false; > <snip> I know, that's what we're discussing. So again, why the hatred for migrate_disable() ? :) thanks, - Joel
On Thu, Jul 16, 2020 at 02:27:07PM -0400, Joel Fernandes wrote: > On Thu, Jul 16, 2020 at 04:37:14PM +0200, Uladzislau Rezki wrote: > > On Thu, Jul 16, 2020 at 09:36:47AM -0400, Joel Fernandes wrote: > > > On Thu, Jul 16, 2020 at 11:19:13AM +0200, Uladzislau Rezki wrote: > > > > On Wed, Jul 15, 2020 at 07:13:33PM -0400, Joel Fernandes wrote: > > > > > On Wed, Jul 15, 2020 at 2:56 PM Sebastian Andrzej Siewior > > > > > <bigeasy@linutronix.de> wrote: > > > > > > > > > > > > On 2020-07-15 20:35:37 [+0200], Uladzislau Rezki (Sony) wrote: > > > > > > > @@ -3306,6 +3307,9 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) > > > > > > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > > > > > > > return false; > > > > > > > > > > > > > > + preempt_disable(); > > > > > > > + krc_this_cpu_unlock(*krcp, *flags); > > > > > > > > > > > > Now you enter memory allocator with disabled preemption. This isn't any > > > > > > better but we don't have a warning for this yet. > > > > > > What happened to the part where I asked for a spinlock_t? > > > > > > > > > > Ulad, > > > > > Wouldn't the replacing of preempt_disable() with migrate_disable() > > > > > above resolve Sebastian's issue? > > > > > > > > > This for regular kernel only. That means that migrate_disable() is > > > > equal to preempt_disable(). So, no difference. > > > > > > But this will force preempt_disable() context into the low-level page > > > allocator on -RT kernels which I believe is not what Sebastian wants. The > > > whole reason why the spinlock vs raw-spinlock ordering matters is, because on > > > RT, the spinlock is sleeping. So if you have: > > > > > > raw_spin_lock(..); > > > spin_lock(..); <-- can sleep on RT, so Sleep while atomic (SWA) violation. > > > > > > That's the main reason you are dropping the lock before calling the > > > allocator. > > > > > No. Please read the commit message of this patch. This is for regular kernel. > > Wait, so what is the hesitation to put migrate_disable() here? It is even > further documentation (annotation) that the goal here is to stay on the same > CPU - as you indicated in later emails. > Actually preempt_disable() does the same for !RT. I agree that migrate_disable() annotation looks better from the point you mentioned. > And the documentation aspect is also something Sebastian brought. A plain > preempt_disable() is frowned up if there are alternative API that document > the usage. > > > You did a patch: > > > > <snip> > > if (IS_ENABLED(CONFIG_PREEMPT_RT)) > > return false; > > <snip> > > I know, that's what we're discussing. > > So again, why the hatred for migrate_disable() ? :) > Let's do migrate_disable(), i do not mind :) -- Vlad Rezki
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 21c2fa5bd8c3..7469bd1e5c2c 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -3278,21 +3278,22 @@ static void kfree_rcu_monitor(struct work_struct *work) } static inline bool -kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) +add_ptr_to_bulk_krc_lock(struct kfree_rcu_cpu **krcp, + unsigned long *flags, void *ptr) { struct kvfree_rcu_bulk_data *bnode; int idx; - if (unlikely(!krcp->initialized)) + *krcp = krc_this_cpu_lock(flags); + if (unlikely(!(*krcp)->initialized)) return false; - lockdep_assert_held(&krcp->lock); idx = !!is_vmalloc_addr(ptr); /* Check if a new block is required. */ - if (!krcp->bkvhead[idx] || - krcp->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) { - bnode = get_cached_bnode(krcp); + if (!(*krcp)->bkvhead[idx] || + (*krcp)->bkvhead[idx]->nr_records == KVFREE_BULK_MAX_ENTR) { + bnode = get_cached_bnode(*krcp); if (!bnode) { /* * To keep this path working on raw non-preemptible @@ -3306,6 +3307,9 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) if (IS_ENABLED(CONFIG_PREEMPT_RT)) return false; + preempt_disable(); + krc_this_cpu_unlock(*krcp, *flags); + /* * NOTE: For one argument of kvfree_rcu() we can * drop the lock and get the page in sleepable @@ -3315,6 +3319,9 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) */ bnode = (struct kvfree_rcu_bulk_data *) __get_free_page(GFP_NOWAIT | __GFP_NOWARN); + + *krcp = krc_this_cpu_lock(flags); + preempt_enable(); } /* Switch to emergency path. */ @@ -3323,15 +3330,15 @@ kvfree_call_rcu_add_ptr_to_bulk(struct kfree_rcu_cpu *krcp, void *ptr) /* Initialize the new block. */ bnode->nr_records = 0; - bnode->next = krcp->bkvhead[idx]; + bnode->next = (*krcp)->bkvhead[idx]; /* Attach it to the head. */ - krcp->bkvhead[idx] = bnode; + (*krcp)->bkvhead[idx] = bnode; } /* Finally insert. */ - krcp->bkvhead[idx]->records - [krcp->bkvhead[idx]->nr_records++] = ptr; + (*krcp)->bkvhead[idx]->records + [(*krcp)->bkvhead[idx]->nr_records++] = ptr; return true; } @@ -3369,24 +3376,19 @@ void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) ptr = (unsigned long *) func; } - krcp = krc_this_cpu_lock(&flags); - // Queue the object but don't yet schedule the batch. if (debug_rcu_head_queue(ptr)) { // Probable double kfree_rcu(), just leak. WARN_ONCE(1, "%s(): Double-freed call. rcu_head %p\n", __func__, head); - - // Mark as success and leave. - success = true; - goto unlock_return; + return; } /* * Under high memory pressure GFP_NOWAIT can fail, * in that case the emergency path is maintained. */ - success = kvfree_call_rcu_add_ptr_to_bulk(krcp, ptr); + success = add_ptr_to_bulk_krc_lock(&krcp, &flags, ptr); if (!success) { if (head == NULL) // Inline if kvfree_rcu(one_arg) call.
If the kernel is built with CONFIG_PROVE_RAW_LOCK_NESTING option, the lockedp will complain about violation of the nesting rules. It does the raw_spinlock vs. spinlock nesting checks. Internally the kfree_rcu() uses raw_spinlock_t whereas the page allocator internally deals with spinlock_t to access to its zones. In order to prevent such vialation that is in question we can drop the internal raw_spinlock_t before entering to the page allocaor. Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> --- kernel/rcu/tree.c | 36 +++++++++++++++++++----------------- 1 file changed, 19 insertions(+), 17 deletions(-)