Message ID | 20210601062303.3932513-1-npiggin@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | shoot lazy tlbs | expand |
On 5/31/21 11:22 PM, Nicholas Piggin wrote: > There haven't been objections to the series since last posting, this > is just a rebase and tidies up a few comments minor patch rearranging. > I continue to object to having too many modes. I like my more generic improvements better. Let me try to find some time to email again.
On 6/4/21 9:54 AM, Andy Lutomirski wrote: > On 5/31/21 11:22 PM, Nicholas Piggin wrote: >> There haven't been objections to the series since last posting, this >> is just a rebase and tidies up a few comments minor patch rearranging. >> > > I continue to object to having too many modes. I like my more generic > improvements better. Let me try to find some time to email again. > Specifically, this: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm I, or someone, needs to dust off my membarrier series before any of these kinds of changes get made. The barrier situation in the scheduler is too confusing otherwise.
Excerpts from Andy Lutomirski's message of June 5, 2021 3:05 am: > On 6/4/21 9:54 AM, Andy Lutomirski wrote: >> On 5/31/21 11:22 PM, Nicholas Piggin wrote: >>> There haven't been objections to the series since last posting, this >>> is just a rebase and tidies up a few comments minor patch rearranging. >>> >> >> I continue to object to having too many modes. I like my more generic >> improvements better. Let me try to find some time to email again. >> > > Specifically, this: > > https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm That's worse than what powerpc does with the shoot lazies code so we wouldn't use it anyway. The fact is mm-cpumask and lazy mm is very architecture specific, so I don't really see that another "mode" is such a problem, it's for the most part "this is what powerpc does" -> "this is what powerpc does". The only mode in the context switch is just "take a ref on the lazy mm" or "don't take a ref". Surely that's not too onerous to add!? Actually the bigger part of it is actually the no-lazy mmu mode which is not yet used, I thought it was a neat little demonstrator of how code works with/without lazy but I will get rid of that for submission. > I, or someone, needs to dust off my membarrier series before any of > these kinds of changes get made. The barrier situation in the scheduler > is too confusing otherwise. > I disagree, I've disentangled the changes from membarrier stuff now, they can be done concurrently. Thanks, Nick
Excerpts from Nicholas Piggin's message of June 5, 2021 10:17 am: > Excerpts from Andy Lutomirski's message of June 5, 2021 3:05 am: >> On 6/4/21 9:54 AM, Andy Lutomirski wrote: >>> On 5/31/21 11:22 PM, Nicholas Piggin wrote: >>>> There haven't been objections to the series since last posting, this >>>> is just a rebase and tidies up a few comments minor patch rearranging. >>>> >>> >>> I continue to object to having too many modes. I like my more generic >>> improvements better. Let me try to find some time to email again. >>> >> >> Specifically, this: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm > > That's worse than what powerpc does with the shoot lazies code so > we wouldn't use it anyway. > > The fact is mm-cpumask and lazy mm is very architecture specific, so I > don't really see that another "mode" is such a problem, it's for the > most part "this is what powerpc does" -> "this is what powerpc does". > The only mode in the context switch is just "take a ref on the lazy mm" > or "don't take a ref". Surely that's not too onerous to add!? > > Actually the bigger part of it is actually the no-lazy mmu mode which > is not yet used, I thought it was a neat little demonstrator of how code > works with/without lazy but I will get rid of that for submission. I admit that does add a bit more churn than necessary maybe that was the main objection. Here is the entire kernel/sched/core.c change after that is removed. Pretty simple now. I'll resubmit. Thanks, Nick diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e359c76ea2e2..1be0b97e12ec 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4171,7 +4171,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) __releases(rq->lock) { struct rq *rq = this_rq(); - struct mm_struct *mm = rq->prev_mm; + struct mm_struct *mm = NULL; long prev_state; /* @@ -4190,7 +4190,10 @@ static struct rq *finish_task_switch(struct task_struct *prev) current->comm, current->pid, preempt_count())) preempt_count_set(FORK_PREEMPT_COUNT); - rq->prev_mm = NULL; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + mm = rq->prev_lazy_mm; + rq->prev_lazy_mm = NULL; +#endif /* * A task struct has one reference for the use as "current". @@ -4326,9 +4329,21 @@ context_switch(struct rq *rq, struct task_struct *prev, switch_mm_irqs_off(prev->active_mm, next->mm, next); if (!prev->mm) { // from kernel - /* will mmdrop_lazy_tlb() in finish_task_switch(). */ - rq->prev_mm = prev->active_mm; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + /* Will mmdrop_lazy_tlb() in finish_task_switch(). */ + rq->prev_lazy_mm = prev->active_mm; prev->active_mm = NULL; +#else + /* + * Without MMU_LAZY_TLB_REFCOUNT there is no lazy + * tracking (because no rq->prev_lazy_mm) in + * finish_task_switch, so no mmdrop_lazy_tlb(), + * so no memory barrier for membarrier (see the + * membarrier comment in finish_task_switch()). + * Do it here. + */ + smp_mb(); +#endif } } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index a189bec13729..0729cf19a987 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -961,7 +961,9 @@ struct rq { struct task_struct *idle; struct task_struct *stop; unsigned long next_balance; - struct mm_struct *prev_mm; +#ifdef CONFIG_MMU_LAZY_TLB_REFCOUNT + struct mm_struct *prev_lazy_mm; +#endif unsigned int clock_update_flags; u64 clock;
Excerpts from Nicholas Piggin's message of June 5, 2021 10:26 am: > Excerpts from Nicholas Piggin's message of June 5, 2021 10:17 am: >> Excerpts from Andy Lutomirski's message of June 5, 2021 3:05 am: >>> On 6/4/21 9:54 AM, Andy Lutomirski wrote: >>>> On 5/31/21 11:22 PM, Nicholas Piggin wrote: >>>>> There haven't been objections to the series since last posting, this >>>>> is just a rebase and tidies up a few comments minor patch rearranging. >>>>> >>>> >>>> I continue to object to having too many modes. I like my more generic >>>> improvements better. Let me try to find some time to email again. >>>> >>> >>> Specifically, this: >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm >> >> That's worse than what powerpc does with the shoot lazies code so >> we wouldn't use it anyway. >> >> The fact is mm-cpumask and lazy mm is very architecture specific, so I >> don't really see that another "mode" is such a problem, it's for the >> most part "this is what powerpc does" -> "this is what powerpc does". >> The only mode in the context switch is just "take a ref on the lazy mm" >> or "don't take a ref". Surely that's not too onerous to add!? >> >> Actually the bigger part of it is actually the no-lazy mmu mode which >> is not yet used, I thought it was a neat little demonstrator of how code >> works with/without lazy but I will get rid of that for submission. > > I admit that does add a bit more churn than necessary maybe that was > the main objection. > > Here is the entire kernel/sched/core.c change after that is removed. > Pretty simple now. I'll resubmit. If it gives you some concerns about a great complex new mode, I'll put it a different way -- all this allows is the arch to say that it does not use lazy tlb mms beyond their refcounted lifetime, so there is no need to refcount the lazy tlb reference. That's all it is. One implementation of that is shoot lazies, and that could be done entirely in arch/powerpc via destroy_context (I just put it in mm/ in case it is useful to others, but that's no real difference). So you see it's really just about management of lazies, the refcounting is just a bit on the side. And lazy management is highly arch specific, x86 being one of the really different complex ones there including very complex and unique interactions with membarrier ordering, so that can't be a fair objection. Thanks, Nick