Message ID | 1510241398-25793-1-git-send-email-yu.c.zhang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 09.11.17 at 16:29, <yu.c.zhang@linux.intel.com> wrote: > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -4844,9 +4844,10 @@ int map_pages_to_xen( > { > unsigned long base_mfn; > > - pl1e = l2e_to_l1e(*pl2e); > if ( locking ) > spin_lock(&map_pgdir_lock); > + > + pl1e = l2e_to_l1e(*pl2e); > base_mfn = l1e_get_pfn(*pl1e) & ~(L1_PAGETABLE_ENTRIES - 1); > for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++, pl1e++ ) > if ( (l1e_get_pfn(*pl1e) != (base_mfn + i)) || I agree with the general observation, but there are three things I'd like to see considered: 1) Please extend the change slightly such that the L2E re-consolidation code matches the L3E one (i.e. latch into ol2e earlier and pass that one to l2e_to_l1e(). Personally I would even prefer if the presence/absence of blank lines matched between the two pieces of code. 2) Is your change actually enough to take care of all forms of the race you describe? In particular, isn't it necessary to re-check PSE after having taken the lock, in case another CPU has just finished doing the re-consolidation? 3) What about the empty&free checks in modify_xen_mappings()? Jan
>>> On 09.11.17 at 16:29, <yu.c.zhang@linux.intel.com> wrote: > In map_pages_to_xen(), a L2 page table entry may be reset to point to > a superpage, and its corresponding L1 page table need be freed in such > scenario, when these L1 page table entries are mapping to consecutive > page frames and having the same mapping flags. > > However, variable `pl1e` is not protected by the lock before L1 page table > is enumerated. A race condition may happen if this code path is invoked > simultaneously on different CPUs. > > For example, `pl1e` value on CPU0 may hold an obsolete value, pointing > to a page which has just been freed on CPU1. Besides, before this page > is reused, it will still be holding the old PTEs, referencing consecutive > page frames. Consequently the `free_xen_pagetable(l2e_to_l1e(ol2e))` will > be triggered on CPU0, resulting the unexpected free of a normal page. > > Protecting the `pl1e` with the lock will fix this race condition. > > Signed-off-by: Min He <min.he@intel.com> > Signed-off-by: Yi Zhang <yi.z.zhang@intel.com> > Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Oh, one more thing: Is it really the case that all three of you contributed to the patch? We don't use the Linux model of everyone through whose hands a patch passes adding an S-o-b of their own - that would rather be Reviewed-by then (if applicable). Also generally I would consider the first S-o-b to be that of the original author, yet the absence of an explicit From: tag makes authorship ambiguous here. Please clarify this in v2. Jan
On 11/9/2017 5:19 PM, Jan Beulich wrote: >>>> On 09.11.17 at 16:29, <yu.c.zhang@linux.intel.com> wrote: >> --- a/xen/arch/x86/mm.c >> +++ b/xen/arch/x86/mm.c >> @@ -4844,9 +4844,10 @@ int map_pages_to_xen( >> { >> unsigned long base_mfn; >> >> - pl1e = l2e_to_l1e(*pl2e); >> if ( locking ) >> spin_lock(&map_pgdir_lock); >> + >> + pl1e = l2e_to_l1e(*pl2e); >> base_mfn = l1e_get_pfn(*pl1e) & ~(L1_PAGETABLE_ENTRIES - 1); >> for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++, pl1e++ ) >> if ( (l1e_get_pfn(*pl1e) != (base_mfn + i)) || > I agree with the general observation, but there are three things I'd > like to see considered: > > 1) Please extend the change slightly such that the L2E > re-consolidation code matches the L3E one (i.e. latch into ol2e > earlier and pass that one to l2e_to_l1e(). Personally I would even > prefer if the presence/absence of blank lines matched between > the two pieces of code. Got it. Thanks. > > 2) Is your change actually enough to take care of all forms of the > race you describe? In particular, isn't it necessary to re-check PSE > after having taken the lock, in case another CPU has just finished > doing the re-consolidation? Good question. :-) I'd thought of checking the PSE for pl2e, and dropped that. My understanding was below: After the lock is taken, pl2e will be pointing to either a L1 page table in normal cases; or to a superpage if another CPU has just finished the re-consolidation and released the lock. And for the latter scenario, l1e_get_pfn(*pl1e) shall not be equal to (base_mfn + i), and will not jump out the the loop. But after second thought, above understanding is based on assumption of the contents of the target superpage. No matter how small the chance is, we can not make such assumption. So my suggestion is we add the check the PSE and if it is set, "goto check_l3". Is this reasonable to you? > > 3) What about the empty&free checks in modify_xen_mappings()? Oh. Thanks for the remind. Just had a look. It seems pl1e or pl2e may be freed more than once for the empty & free checks, due to lack of protection. So we'd better add a lock too, right? Yu
On 11/9/2017 5:22 PM, Jan Beulich wrote: >>>> On 09.11.17 at 16:29, <yu.c.zhang@linux.intel.com> wrote: >> In map_pages_to_xen(), a L2 page table entry may be reset to point to >> a superpage, and its corresponding L1 page table need be freed in such >> scenario, when these L1 page table entries are mapping to consecutive >> page frames and having the same mapping flags. >> >> However, variable `pl1e` is not protected by the lock before L1 page table >> is enumerated. A race condition may happen if this code path is invoked >> simultaneously on different CPUs. >> >> For example, `pl1e` value on CPU0 may hold an obsolete value, pointing >> to a page which has just been freed on CPU1. Besides, before this page >> is reused, it will still be holding the old PTEs, referencing consecutive >> page frames. Consequently the `free_xen_pagetable(l2e_to_l1e(ol2e))` will >> be triggered on CPU0, resulting the unexpected free of a normal page. >> >> Protecting the `pl1e` with the lock will fix this race condition. >> >> Signed-off-by: Min He <min.he@intel.com> >> Signed-off-by: Yi Zhang <yi.z.zhang@intel.com> >> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> > Oh, one more thing: Is it really the case that all three of you > contributed to the patch? We don't use the Linux model of > everyone through whose hands a patch passes adding an > S-o-b of their own - that would rather be Reviewed-by then (if > applicable). > > Also generally I would consider the first S-o-b to be that of the > original author, yet the absence of an explicit From: tag makes > authorship ambiguous here. Please clarify this in v2. Oh, we three found this issue when debugging a bug together. And Min is the author of this patch. So I'd like to add "From: Min He <min.he@intel.com> " at the beginning of the commit message in v2. :-) Yu > Jan > >
>>> On 09.11.17 at 11:24, <yu.c.zhang@linux.intel.com> wrote: > On 11/9/2017 5:19 PM, Jan Beulich wrote: >> 2) Is your change actually enough to take care of all forms of the >> race you describe? In particular, isn't it necessary to re-check PSE >> after having taken the lock, in case another CPU has just finished >> doing the re-consolidation? > > Good question. :-) > > I'd thought of checking the PSE for pl2e, and dropped that. My understanding > was below: > After the lock is taken, pl2e will be pointing to either a L1 page table > in normal > cases; or to a superpage if another CPU has just finished the > re-consolidation > and released the lock. And for the latter scenario, l1e_get_pfn(*pl1e) > shall not > be equal to (base_mfn + i), and will not jump out the the loop. > > But after second thought, above understanding is based on assumption of the > contents of the target superpage. No matter how small the chance is, we can > not make such assumption. > > So my suggestion is we add the check the PSE and if it is set, "goto > check_l3". > Is this reasonable to you? Yes; for the L3 case it'll be a simple "continue" afaict. >> 3) What about the empty&free checks in modify_xen_mappings()? > > Oh. Thanks for the remind. > Just had a look. It seems pl1e or pl2e may be freed more than once for the > empty & free checks, due to lack of protection. > So we'd better add a lock too, right? Yes, I think so. Jan
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index a20fdca..9c9afa1 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -4844,9 +4844,10 @@ int map_pages_to_xen( { unsigned long base_mfn; - pl1e = l2e_to_l1e(*pl2e); if ( locking ) spin_lock(&map_pgdir_lock); + + pl1e = l2e_to_l1e(*pl2e); base_mfn = l1e_get_pfn(*pl1e) & ~(L1_PAGETABLE_ENTRIES - 1); for ( i = 0; i < L1_PAGETABLE_ENTRIES; i++, pl1e++ ) if ( (l1e_get_pfn(*pl1e) != (base_mfn + i)) ||