Message ID | 1510298286-30952-1-git-send-email-yu.c.zhang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 10.11.17 at 08:18, <yu.c.zhang@linux.intel.com> wrote: > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -4844,9 +4844,19 @@ int map_pages_to_xen( > { > unsigned long base_mfn; > > - pl1e = l2e_to_l1e(*pl2e); > if ( locking ) > spin_lock(&map_pgdir_lock); > + > + /* Skip the re-consolidation if it's done on another CPU. */ > + if ( l2e_get_flags(*pl2e) & _PAGE_PSE ) > + { > + if ( locking ) > + spin_unlock(&map_pgdir_lock); > + goto check_l3; > + } > + > + ol2e = *pl2e; This wants moving up ahead of the if(), so you can use the local variable inside that if(). > @@ -4880,6 +4889,15 @@ int map_pages_to_xen( > > if ( locking ) > spin_lock(&map_pgdir_lock); > + > + /* Skip the re-consolidation if it's done on another CPU. */ > + if ( l3e_get_flags(*pl3e) & _PAGE_PSE ) > + { > + if ( locking ) > + spin_unlock(&map_pgdir_lock); > + continue; > + } > + > ol3e = *pl3e; Same here - move the if() below here and use ol3e in there. With that Reviewed-by: Jan Beulich <jbeulich@suse.com> I'm not certain this is important enough a fix to consider for 4.10, and you seem to think it's good enough if this gets applied only after the tree would be branched, as you didn't Cc Julien. Please indicate if you actually simply weren't aware, and you indeed there's an important aspect to this that I'm overlooking. Jan
On 11/10/2017 5:49 PM, Jan Beulich wrote: >>>> On 10.11.17 at 08:18, <yu.c.zhang@linux.intel.com> wrote: >> --- a/xen/arch/x86/mm.c >> +++ b/xen/arch/x86/mm.c >> @@ -4844,9 +4844,19 @@ int map_pages_to_xen( >> { >> unsigned long base_mfn; >> >> - pl1e = l2e_to_l1e(*pl2e); >> if ( locking ) >> spin_lock(&map_pgdir_lock); >> + >> + /* Skip the re-consolidation if it's done on another CPU. */ >> + if ( l2e_get_flags(*pl2e) & _PAGE_PSE ) >> + { >> + if ( locking ) >> + spin_unlock(&map_pgdir_lock); >> + goto check_l3; >> + } >> + >> + ol2e = *pl2e; > This wants moving up ahead of the if(), so you can use the local > variable inside that if(). Got it. Thanks. >> @@ -4880,6 +4889,15 @@ int map_pages_to_xen( >> >> if ( locking ) >> spin_lock(&map_pgdir_lock); >> + >> + /* Skip the re-consolidation if it's done on another CPU. */ >> + if ( l3e_get_flags(*pl3e) & _PAGE_PSE ) >> + { >> + if ( locking ) >> + spin_unlock(&map_pgdir_lock); >> + continue; >> + } >> + >> ol3e = *pl3e; > Same here - move the if() below here and use ol3e in there. > > With that > Reviewed-by: Jan Beulich <jbeulich@suse.com> > > I'm not certain this is important enough a fix to consider for 4.10, > and you seem to think it's good enough if this gets applied only > after the tree would be branched, as you didn't Cc Julien. Please > indicate if you actually simply weren't aware, and you indeed > there's an important aspect to this that I'm overlooking. Well, at first I have not expected this to be accepted for 4.10. But since we have met this issue in practice, when running a graphic application which consumes memory intensively in dom0, I think it also makes sense if we can fix it in xen's release as early as possible. Do you think this is a reasonable requirement? :-) Also Cc to Julien. Julien, any comment? :-) Yu > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
>>> On 10.11.17 at 15:05, <yu.c.zhang@linux.intel.com> wrote: > On 11/10/2017 5:49 PM, Jan Beulich wrote: >> I'm not certain this is important enough a fix to consider for 4.10, >> and you seem to think it's good enough if this gets applied only >> after the tree would be branched, as you didn't Cc Julien. Please >> indicate if you actually simply weren't aware, and you indeed >> there's an important aspect to this that I'm overlooking. > > Well, at first I have not expected this to be accepted for 4.10. But > since we have > met this issue in practice, when running a graphic application which > consumes > memory intensively in dom0, I think it also makes sense if we can fix it > in xen's > release as early as possible. Do you think this is a reasonable > requirement? :-) You'd need to provide further details for us to understand the scenario. It obviously depends on whether you have other patches to Xen which actually trigger this. If the problem can be triggered from outside of a vanilla upstream Xen, then yes, I think I would favor the fixes being included. Jan
On 11/13/2017 5:31 PM, Jan Beulich wrote: >>>> On 10.11.17 at 15:05, <yu.c.zhang@linux.intel.com> wrote: >> On 11/10/2017 5:49 PM, Jan Beulich wrote: >>> I'm not certain this is important enough a fix to consider for 4.10, >>> and you seem to think it's good enough if this gets applied only >>> after the tree would be branched, as you didn't Cc Julien. Please >>> indicate if you actually simply weren't aware, and you indeed >>> there's an important aspect to this that I'm overlooking. >> Well, at first I have not expected this to be accepted for 4.10. But >> since we have >> met this issue in practice, when running a graphic application which >> consumes >> memory intensively in dom0, I think it also makes sense if we can fix it >> in xen's >> release as early as possible. Do you think this is a reasonable >> requirement? :-) > You'd need to provide further details for us to understand the > scenario. It obviously depends on whether you have other > patches to Xen which actually trigger this. If the problem can > be triggered from outside of a vanilla upstream Xen, then yes, > I think I would favor the fixes being included. Thank, Jan. Let me try to give an explaination of the scenario. :-) We saw an ASSERT failue in ASSERT((page->count_info & PGC_count_mask) != 0) in is_iomem_page() <- put_page_from_l1e() <- alloc_l1_table(), when we run a graphic application(which is a memory eater, but close sourced) in dom0. And this failure only happens when dom0 is configured with 2 vCPUs. Our debug showed the concerned page->count_info was already(and unexpectedly) cleared in free_xenheap_pages(), and the call trace should be like this: free_xenheap_pages() ^ | free_xen_pagetable() ^ | map_pages_to_xen() ^ | update_xen_mappings() ^ | get_page_from_l1e() ^ | mod_l1_entry() ^ | do_mmu_update() And we then realized that it happened when dom0 tries to update its page table, and when the cache attributes are gonna be changed for referenced page frame, corresponding mappings for xen VA space will be updated by map_pages_to_xen() as well. However, since routine map_pages_to_xen() has the aforementioned racing problem, when MMU_NORMAL_PT_UPDATE is triggered concurrently on different CPUs, it may mistakenly free a superpage referenced by pl2e. That's why our ASSERT failure only happens when dom0 has more than one vCPU configured. As to the code base, we were running XenGT code, which has only a few non-upstreamed patches in Xen - I believe most of them are libxl related ones, and none of them is mmu related. So I believe this issue could be triggered by a pv guest to a vanilla upstream xen. Is above description convincing enough? :-) Yu > > Jan > >
>>> On 13.11.17 at 11:34, <yu.c.zhang@linux.intel.com> wrote: > Our debug showed the concerned page->count_info was already(and > unexpectedly) > cleared in free_xenheap_pages(), and the call trace should be like this: > > free_xenheap_pages() > ^ > | > free_xen_pagetable() > ^ > | > map_pages_to_xen() > ^ > | > update_xen_mappings() > ^ > | > get_page_from_l1e() > ^ > | > mod_l1_entry() > ^ > | > do_mmu_update() This ... > Is above description convincing enough? :-) ... is indeed enough for me to suggest to Julien that we take both patches (once they're ready). But it's his decision in the end. Jan
Hi, On 11/13/2017 11:06 AM, Jan Beulich wrote: >>>> On 13.11.17 at 11:34, <yu.c.zhang@linux.intel.com> wrote: >> Our debug showed the concerned page->count_info was already(and >> unexpectedly) >> cleared in free_xenheap_pages(), and the call trace should be like this: >> >> free_xenheap_pages() >> ^ >> | >> free_xen_pagetable() >> ^ >> | >> map_pages_to_xen() >> ^ >> | >> update_xen_mappings() >> ^ >> | >> get_page_from_l1e() >> ^ >> | >> mod_l1_entry() >> ^ >> | >> do_mmu_update() > > This ... > >> Is above description convincing enough? :-) > > ... is indeed enough for me to suggest to Julien that we take both > patches (once they're ready). But it's his decision in the end. I will wait the series to be ready before giving my release-ack. Cheers, > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel >
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index a20fdca..47855fb 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4844,9 +4844,19 @@ int map_pages_to_xen( { unsigned long base_mfn; - pl1e = l2e_to_l1e(*pl2e); if ( locking ) spin_lock(&map_pgdir_lock); + + /* Skip the re-consolidation if it's done on another CPU. */ + if ( l2e_get_flags(*pl2e) & _PAGE_PSE ) + { + if ( locking ) + spin_unlock(&map_pgdir_lock); + goto check_l3; + } + + ol2e = *pl2e; + pl1e = l2e_to_l1e(ol2e); base_mfn = l1e_get_pfn(*pl1e) & ~(L1_PAGETABLE_ENTRIES - 1); for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++, pl1e++ ) if ( (l1e_get_pfn(*pl1e) != (base_mfn + i)) || @@ -4854,7 +4864,6 @@ int map_pages_to_xen( break; if ( i == L1_PAGETABLE_ENTRIES ) { - ol2e = *pl2e; l2e_write_atomic(pl2e, l2e_from_pfn(base_mfn, l1f_to_lNf(flags))); if ( locking ) @@ -4880,6 +4889,15 @@ int map_pages_to_xen( if ( locking ) spin_lock(&map_pgdir_lock); + + /* Skip the re-consolidation if it's done on another CPU. */ + if ( l3e_get_flags(*pl3e) & _PAGE_PSE ) + { + if ( locking ) + spin_unlock(&map_pgdir_lock); + continue; + } + ol3e = *pl3e; pl2e = l3e_to_l2e(ol3e); base_mfn = l2e_get_pfn(*pl2e) & ~(L2_PAGETABLE_ENTRIES *