Message ID | 20240619080940.2690756-5-maobibo@loongson.cn (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | LoongArch: KVM: Fix some issues relative with mmu | expand |
Hi, Bibo, On Wed, Jun 19, 2024 at 4:09 PM Bibo Mao <maobibo@loongson.cn> wrote: > > When updating pmd entry such as allocating new pmd page or splitting > huge page into normal page, it is necessary to firstly update all pte > entries, and then update pmd entry. > > It is weak order with LoongArch system, there will be problem if other > vcpus sees pmd update firstly however pte is not updated. Here smp_wmb() > is added to assure this. Memory barriers should be in pairs in most cases. That means you may lose smp_rmb() in another place. Huacai > > Signed-off-by: Bibo Mao <maobibo@loongson.cn> > --- > arch/loongarch/kvm/mmu.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c > index 1690828bd44b..7f04edfbe428 100644 > --- a/arch/loongarch/kvm/mmu.c > +++ b/arch/loongarch/kvm/mmu.c > @@ -163,6 +163,7 @@ static kvm_pte_t *kvm_populate_gpa(struct kvm *kvm, > > child = kvm_mmu_memory_cache_alloc(cache); > _kvm_pte_init(child, ctx.invalid_ptes[ctx.level - 1]); > + smp_wmb(); /* make pte visible before pmd */ > kvm_set_pte(entry, __pa(child)); > } else if (kvm_pte_huge(*entry)) { > return entry; > @@ -746,6 +747,7 @@ static kvm_pte_t *kvm_split_huge(struct kvm_vcpu *vcpu, kvm_pte_t *ptep, gfn_t g > val += PAGE_SIZE; > } > > + smp_wmb(); > /* The later kvm_flush_tlb_gpa() will flush hugepage tlb */ > kvm_set_pte(ptep, __pa(child)); > > -- > 2.39.3 >
On 2024/6/23 下午6:18, Huacai Chen wrote: > Hi, Bibo, > > On Wed, Jun 19, 2024 at 4:09 PM Bibo Mao <maobibo@loongson.cn> wrote: >> >> When updating pmd entry such as allocating new pmd page or splitting >> huge page into normal page, it is necessary to firstly update all pte >> entries, and then update pmd entry. >> >> It is weak order with LoongArch system, there will be problem if other >> vcpus sees pmd update firstly however pte is not updated. Here smp_wmb() >> is added to assure this. > Memory barriers should be in pairs in most cases. That means you may > lose smp_rmb() in another place. The idea adding smp_wmb() comes from function __split_huge_pmd_locked() in file mm/huge_memory.c, and the explanation is reasonable. ... set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); } ... smp_wmb(); /* make pte visible before pmd */ pmd_populate(mm, pmd, pgtable); It is strange that why smp_rmb() should be in pairs with smp_wmb(), I never hear this rule -:( Regards Bibo Mao > > Huacai > >> >> Signed-off-by: Bibo Mao <maobibo@loongson.cn> >> --- >> arch/loongarch/kvm/mmu.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c >> index 1690828bd44b..7f04edfbe428 100644 >> --- a/arch/loongarch/kvm/mmu.c >> +++ b/arch/loongarch/kvm/mmu.c >> @@ -163,6 +163,7 @@ static kvm_pte_t *kvm_populate_gpa(struct kvm *kvm, >> >> child = kvm_mmu_memory_cache_alloc(cache); >> _kvm_pte_init(child, ctx.invalid_ptes[ctx.level - 1]); >> + smp_wmb(); /* make pte visible before pmd */ >> kvm_set_pte(entry, __pa(child)); >> } else if (kvm_pte_huge(*entry)) { >> return entry; >> @@ -746,6 +747,7 @@ static kvm_pte_t *kvm_split_huge(struct kvm_vcpu *vcpu, kvm_pte_t *ptep, gfn_t g >> val += PAGE_SIZE; >> } >> >> + smp_wmb(); >> /* The later kvm_flush_tlb_gpa() will flush hugepage tlb */ >> kvm_set_pte(ptep, __pa(child)); >> >> -- >> 2.39.3 >>
On Mon, Jun 24, 2024 at 9:37 AM maobibo <maobibo@loongson.cn> wrote: > > > > On 2024/6/23 下午6:18, Huacai Chen wrote: > > Hi, Bibo, > > > > On Wed, Jun 19, 2024 at 4:09 PM Bibo Mao <maobibo@loongson.cn> wrote: > >> > >> When updating pmd entry such as allocating new pmd page or splitting > >> huge page into normal page, it is necessary to firstly update all pte > >> entries, and then update pmd entry. > >> > >> It is weak order with LoongArch system, there will be problem if other > >> vcpus sees pmd update firstly however pte is not updated. Here smp_wmb() > >> is added to assure this. > > Memory barriers should be in pairs in most cases. That means you may > > lose smp_rmb() in another place. > The idea adding smp_wmb() comes from function __split_huge_pmd_locked() > in file mm/huge_memory.c, and the explanation is reasonable. > > ... > set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); > } > ... > smp_wmb(); /* make pte visible before pmd */ > pmd_populate(mm, pmd, pgtable); > > It is strange that why smp_rmb() should be in pairs with smp_wmb(), > I never hear this rule -:( https://docs.kernel.org/core-api/wrappers/memory-barriers.html SMP BARRIER PAIRING ------------------- When dealing with CPU-CPU interactions, certain types of memory barrier should always be paired. A lack of appropriate pairing is almost certainly an error. Huacai > > Regards > Bibo Mao > > > > Huacai > > > >> > >> Signed-off-by: Bibo Mao <maobibo@loongson.cn> > >> --- > >> arch/loongarch/kvm/mmu.c | 2 ++ > >> 1 file changed, 2 insertions(+) > >> > >> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c > >> index 1690828bd44b..7f04edfbe428 100644 > >> --- a/arch/loongarch/kvm/mmu.c > >> +++ b/arch/loongarch/kvm/mmu.c > >> @@ -163,6 +163,7 @@ static kvm_pte_t *kvm_populate_gpa(struct kvm *kvm, > >> > >> child = kvm_mmu_memory_cache_alloc(cache); > >> _kvm_pte_init(child, ctx.invalid_ptes[ctx.level - 1]); > >> + smp_wmb(); /* make pte visible before pmd */ > >> kvm_set_pte(entry, __pa(child)); > >> } else if (kvm_pte_huge(*entry)) { > >> return entry; > >> @@ -746,6 +747,7 @@ static kvm_pte_t *kvm_split_huge(struct kvm_vcpu *vcpu, kvm_pte_t *ptep, gfn_t g > >> val += PAGE_SIZE; > >> } > >> > >> + smp_wmb(); > >> /* The later kvm_flush_tlb_gpa() will flush hugepage tlb */ > >> kvm_set_pte(ptep, __pa(child)); > >> > >> -- > >> 2.39.3 > >> > >
On 2024/6/24 上午9:56, Huacai Chen wrote: > On Mon, Jun 24, 2024 at 9:37 AM maobibo <maobibo@loongson.cn> wrote: >> >> >> >> On 2024/6/23 下午6:18, Huacai Chen wrote: >>> Hi, Bibo, >>> >>> On Wed, Jun 19, 2024 at 4:09 PM Bibo Mao <maobibo@loongson.cn> wrote: >>>> >>>> When updating pmd entry such as allocating new pmd page or splitting >>>> huge page into normal page, it is necessary to firstly update all pte >>>> entries, and then update pmd entry. >>>> >>>> It is weak order with LoongArch system, there will be problem if other >>>> vcpus sees pmd update firstly however pte is not updated. Here smp_wmb() >>>> is added to assure this. >>> Memory barriers should be in pairs in most cases. That means you may >>> lose smp_rmb() in another place. >> The idea adding smp_wmb() comes from function __split_huge_pmd_locked() >> in file mm/huge_memory.c, and the explanation is reasonable. >> >> ... >> set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); >> } >> ... >> smp_wmb(); /* make pte visible before pmd */ >> pmd_populate(mm, pmd, pgtable); >> >> It is strange that why smp_rmb() should be in pairs with smp_wmb(), >> I never hear this rule -:( > https://docs.kernel.org/core-api/wrappers/memory-barriers.html > > SMP BARRIER PAIRING > ------------------- > > When dealing with CPU-CPU interactions, certain types of memory barrier should > always be paired. A lack of appropriate pairing is almost certainly an error. CPU 1 CPU 2 =============== =============== WRITE_ONCE(a, 1); <write barrier> WRITE_ONCE(b, 2); x = READ_ONCE(b); <read barrier> y = READ_ONCE(a); With split_huge scenery to update pte/pmd entry, there is no strong relationship between address ptex and pmd. CPU1 WRITE_ONCE(pte0, 1); WRITE_ONCE(pte511, 1); <write barrier> WRITE_ONCE(pmd, 2); However with page table walk scenery, address ptep depends on the contents of pmd, so it is not necessary to add smp_rmb(). ptep = pte_offset_map_lock(mm, pmd, address, &ptl); if (!ptep) return no_page_table(vma, flags, address); pte = ptep_get(ptep); if (!pte_present(pte)) It is just my option, or do you think where smp_rmb() barrier should be added in page table reader path? Regards Bibo Mao > > > Huacai > >> >> Regards >> Bibo Mao >>> >>> Huacai >>> >>>> >>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn> >>>> --- >>>> arch/loongarch/kvm/mmu.c | 2 ++ >>>> 1 file changed, 2 insertions(+) >>>> >>>> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c >>>> index 1690828bd44b..7f04edfbe428 100644 >>>> --- a/arch/loongarch/kvm/mmu.c >>>> +++ b/arch/loongarch/kvm/mmu.c >>>> @@ -163,6 +163,7 @@ static kvm_pte_t *kvm_populate_gpa(struct kvm *kvm, >>>> >>>> child = kvm_mmu_memory_cache_alloc(cache); >>>> _kvm_pte_init(child, ctx.invalid_ptes[ctx.level - 1]); >>>> + smp_wmb(); /* make pte visible before pmd */ >>>> kvm_set_pte(entry, __pa(child)); >>>> } else if (kvm_pte_huge(*entry)) { >>>> return entry; >>>> @@ -746,6 +747,7 @@ static kvm_pte_t *kvm_split_huge(struct kvm_vcpu *vcpu, kvm_pte_t *ptep, gfn_t g >>>> val += PAGE_SIZE; >>>> } >>>> >>>> + smp_wmb(); >>>> /* The later kvm_flush_tlb_gpa() will flush hugepage tlb */ >>>> kvm_set_pte(ptep, __pa(child)); >>>> >>>> -- >>>> 2.39.3 >>>> >> >>
On Mon, Jun 24, 2024 at 10:21 AM maobibo <maobibo@loongson.cn> wrote: > > > > On 2024/6/24 上午9:56, Huacai Chen wrote: > > On Mon, Jun 24, 2024 at 9:37 AM maobibo <maobibo@loongson.cn> wrote: > >> > >> > >> > >> On 2024/6/23 下午6:18, Huacai Chen wrote: > >>> Hi, Bibo, > >>> > >>> On Wed, Jun 19, 2024 at 4:09 PM Bibo Mao <maobibo@loongson.cn> wrote: > >>>> > >>>> When updating pmd entry such as allocating new pmd page or splitting > >>>> huge page into normal page, it is necessary to firstly update all pte > >>>> entries, and then update pmd entry. > >>>> > >>>> It is weak order with LoongArch system, there will be problem if other > >>>> vcpus sees pmd update firstly however pte is not updated. Here smp_wmb() > >>>> is added to assure this. > >>> Memory barriers should be in pairs in most cases. That means you may > >>> lose smp_rmb() in another place. > >> The idea adding smp_wmb() comes from function __split_huge_pmd_locked() > >> in file mm/huge_memory.c, and the explanation is reasonable. > >> > >> ... > >> set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); > >> } > >> ... > >> smp_wmb(); /* make pte visible before pmd */ > >> pmd_populate(mm, pmd, pgtable); > >> > >> It is strange that why smp_rmb() should be in pairs with smp_wmb(), > >> I never hear this rule -:( > > https://docs.kernel.org/core-api/wrappers/memory-barriers.html > > > > SMP BARRIER PAIRING > > ------------------- > > > > When dealing with CPU-CPU interactions, certain types of memory barrier should > > always be paired. A lack of appropriate pairing is almost certainly an error. > CPU 1 CPU 2 > =============== =============== > WRITE_ONCE(a, 1); > <write barrier> > WRITE_ONCE(b, 2); x = READ_ONCE(b); > <read barrier> > y = READ_ONCE(a); > > With split_huge scenery to update pte/pmd entry, there is no strong > relationship between address ptex and pmd. > CPU1 > WRITE_ONCE(pte0, 1); > WRITE_ONCE(pte511, 1); > <write barrier> > WRITE_ONCE(pmd, 2); > > However with page table walk scenery, address ptep depends on the > contents of pmd, so it is not necessary to add smp_rmb(). > ptep = pte_offset_map_lock(mm, pmd, address, &ptl); > if (!ptep) > return no_page_table(vma, flags, address); > pte = ptep_get(ptep); > if (!pte_present(pte)) > > It is just my option, or do you think where smp_rmb() barrier should be > added in page table reader path? There are some possibilities: 1. Read barrier is missing in some places; 2. Write barrier is also unnecessary here; 3. Read barrier is really unnecessary, but there is a better API to replace the write barrier; 4. Read barrier is really unnecessary, and write barrier is really the best API here. Maybe Rui Wang knows better here. Huacai > > Regards > Bibo Mao > > > > > > Huacai > > > >> > >> Regards > >> Bibo Mao > >>> > >>> Huacai > >>> > >>>> > >>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn> > >>>> --- > >>>> arch/loongarch/kvm/mmu.c | 2 ++ > >>>> 1 file changed, 2 insertions(+) > >>>> > >>>> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c > >>>> index 1690828bd44b..7f04edfbe428 100644 > >>>> --- a/arch/loongarch/kvm/mmu.c > >>>> +++ b/arch/loongarch/kvm/mmu.c > >>>> @@ -163,6 +163,7 @@ static kvm_pte_t *kvm_populate_gpa(struct kvm *kvm, > >>>> > >>>> child = kvm_mmu_memory_cache_alloc(cache); > >>>> _kvm_pte_init(child, ctx.invalid_ptes[ctx.level - 1]); > >>>> + smp_wmb(); /* make pte visible before pmd */ > >>>> kvm_set_pte(entry, __pa(child)); > >>>> } else if (kvm_pte_huge(*entry)) { > >>>> return entry; > >>>> @@ -746,6 +747,7 @@ static kvm_pte_t *kvm_split_huge(struct kvm_vcpu *vcpu, kvm_pte_t *ptep, gfn_t g > >>>> val += PAGE_SIZE; > >>>> } > >>>> > >>>> + smp_wmb(); > >>>> /* The later kvm_flush_tlb_gpa() will flush hugepage tlb */ > >>>> kvm_set_pte(ptep, __pa(child)); > >>>> > >>>> -- > >>>> 2.39.3 > >>>> > >> > >> > >
Hi, On Mon, Jun 24, 2024 at 12:18 PM Huacai Chen <chenhuacai@kernel.org> wrote: > > On Mon, Jun 24, 2024 at 10:21 AM maobibo <maobibo@loongson.cn> wrote: > > > > > > > > On 2024/6/24 上午9:56, Huacai Chen wrote: > > > On Mon, Jun 24, 2024 at 9:37 AM maobibo <maobibo@loongson.cn> wrote: > > >> > > >> > > >> > > >> On 2024/6/23 下午6:18, Huacai Chen wrote: > > >>> Hi, Bibo, > > >>> > > >>> On Wed, Jun 19, 2024 at 4:09 PM Bibo Mao <maobibo@loongson.cn> wrote: > > >>>> > > >>>> When updating pmd entry such as allocating new pmd page or splitting > > >>>> huge page into normal page, it is necessary to firstly update all pte > > >>>> entries, and then update pmd entry. > > >>>> > > >>>> It is weak order with LoongArch system, there will be problem if other > > >>>> vcpus sees pmd update firstly however pte is not updated. Here smp_wmb() > > >>>> is added to assure this. > > >>> Memory barriers should be in pairs in most cases. That means you may > > >>> lose smp_rmb() in another place. > > >> The idea adding smp_wmb() comes from function __split_huge_pmd_locked() > > >> in file mm/huge_memory.c, and the explanation is reasonable. > > >> > > >> ... > > >> set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR); > > >> } > > >> ... > > >> smp_wmb(); /* make pte visible before pmd */ > > >> pmd_populate(mm, pmd, pgtable); > > >> > > >> It is strange that why smp_rmb() should be in pairs with smp_wmb(), > > >> I never hear this rule -:( > > > https://docs.kernel.org/core-api/wrappers/memory-barriers.html > > > > > > SMP BARRIER PAIRING > > > ------------------- > > > > > > When dealing with CPU-CPU interactions, certain types of memory barrier should > > > always be paired. A lack of appropriate pairing is almost certainly an error. > > CPU 1 CPU 2 > > =============== =============== > > WRITE_ONCE(a, 1); > > <write barrier> > > WRITE_ONCE(b, 2); x = READ_ONCE(b); > > <read barrier> > > y = READ_ONCE(a); > > > > With split_huge scenery to update pte/pmd entry, there is no strong > > relationship between address ptex and pmd. > > CPU1 > > WRITE_ONCE(pte0, 1); > > WRITE_ONCE(pte511, 1); > > <write barrier> > > WRITE_ONCE(pmd, 2); > > > > However with page table walk scenery, address ptep depends on the > > contents of pmd, so it is not necessary to add smp_rmb(). > > ptep = pte_offset_map_lock(mm, pmd, address, &ptl); > > if (!ptep) > > return no_page_table(vma, flags, address); > > pte = ptep_get(ptep); > > if (!pte_present(pte)) > > > > It is just my option, or do you think where smp_rmb() barrier should be > > added in page table reader path? > There are some possibilities: > 1. Read barrier is missing in some places; > 2. Write barrier is also unnecessary here; > 3. Read barrier is really unnecessary, but there is a better API to > replace the write barrier; > 4. Read barrier is really unnecessary, and write barrier is really the > best API here. > > Maybe Rui Wang knows better here. It appears that reading the pte address is data-dependent on the pmd, rather than control-dependent. This creates an opportunity to omit the read-side memory barrier. Cheers, -Rui > > Huacai > > > > > Regards > > Bibo Mao > > > > > > > > > Huacai > > > > > >> > > >> Regards > > >> Bibo Mao > > >>> > > >>> Huacai > > >>> > > >>>> > > >>>> Signed-off-by: Bibo Mao <maobibo@loongson.cn> > > >>>> --- > > >>>> arch/loongarch/kvm/mmu.c | 2 ++ > > >>>> 1 file changed, 2 insertions(+) > > >>>> > > >>>> diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c > > >>>> index 1690828bd44b..7f04edfbe428 100644 > > >>>> --- a/arch/loongarch/kvm/mmu.c > > >>>> +++ b/arch/loongarch/kvm/mmu.c > > >>>> @@ -163,6 +163,7 @@ static kvm_pte_t *kvm_populate_gpa(struct kvm *kvm, > > >>>> > > >>>> child = kvm_mmu_memory_cache_alloc(cache); > > >>>> _kvm_pte_init(child, ctx.invalid_ptes[ctx.level - 1]); > > >>>> + smp_wmb(); /* make pte visible before pmd */ > > >>>> kvm_set_pte(entry, __pa(child)); > > >>>> } else if (kvm_pte_huge(*entry)) { > > >>>> return entry; > > >>>> @@ -746,6 +747,7 @@ static kvm_pte_t *kvm_split_huge(struct kvm_vcpu *vcpu, kvm_pte_t *ptep, gfn_t g > > >>>> val += PAGE_SIZE; > > >>>> } > > >>>> > > >>>> + smp_wmb(); > > >>>> /* The later kvm_flush_tlb_gpa() will flush hugepage tlb */ > > >>>> kvm_set_pte(ptep, __pa(child)); > > >>>> > > >>>> -- > > >>>> 2.39.3 > > >>>> > > >> > > >> > > > > >
diff --git a/arch/loongarch/kvm/mmu.c b/arch/loongarch/kvm/mmu.c index 1690828bd44b..7f04edfbe428 100644 --- a/arch/loongarch/kvm/mmu.c +++ b/arch/loongarch/kvm/mmu.c @@ -163,6 +163,7 @@ static kvm_pte_t *kvm_populate_gpa(struct kvm *kvm, child = kvm_mmu_memory_cache_alloc(cache); _kvm_pte_init(child, ctx.invalid_ptes[ctx.level - 1]); + smp_wmb(); /* make pte visible before pmd */ kvm_set_pte(entry, __pa(child)); } else if (kvm_pte_huge(*entry)) { return entry; @@ -746,6 +747,7 @@ static kvm_pte_t *kvm_split_huge(struct kvm_vcpu *vcpu, kvm_pte_t *ptep, gfn_t g val += PAGE_SIZE; } + smp_wmb(); /* The later kvm_flush_tlb_gpa() will flush hugepage tlb */ kvm_set_pte(ptep, __pa(child));
When updating pmd entry such as allocating new pmd page or splitting huge page into normal page, it is necessary to firstly update all pte entries, and then update pmd entry. It is weak order with LoongArch system, there will be problem if other vcpus sees pmd update firstly however pte is not updated. Here smp_wmb() is added to assure this. Signed-off-by: Bibo Mao <maobibo@loongson.cn> --- arch/loongarch/kvm/mmu.c | 2 ++ 1 file changed, 2 insertions(+)