Message ID | 20250319142011.138250-1-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/mm: Fix IS_ALIGNED() check in IS_LnE_ALIGNED() | expand |
On 19.03.2025 15:20, Andrew Cooper wrote: > The current CI failures happen to be a latent bug triggered by a narrow set of > properties of the initrd, which CI encountered by chance. Plus properties of the host memory map. > One step during boot involves constructing directmap mappings for modules. > With some probing at the point of creation, it is observed that there's a 4k > mapping missing towards the end of the initrd. > > (XEN) === Mapped Mod1 [0000000394001000, 00000003be1ff6dc] to Directmap > (XEN) Probing paddr 394001000, va ffff830394001000 > (XEN) Probing paddr 3be1ff6db, va ffff8303be1ff6db > (XEN) Probing paddr 3bdffffff, va ffff8303bdffffff > (XEN) Probing paddr 3be001000, va ffff8303be001000 > (XEN) Probing paddr 3be000000, va ffff8303be000000 > (XEN) Early fatal page fault at e008:ffff82d04032014c (cr2=ffff8303be000000, ec=0000) > > The conditions for this bug appear to be map_pages_to_xen() call with a non-2M > aligned start address, some number of full 2M pages, then a tail needing 4k > pages. > > Anyway, the condition for spotting superpage boundaries in map_pages_to_xen() > is wrong. The IS_ALIGNED() macro expects a power of two for the alignment > argument, and subtracts 1 itself. > > Fixing this causes the failing case to now boot. > > Fixes: 97fb6fcf26e8 ("x86/mm: introduce helpers to detect super page alignment") > Debugged-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> > Judging by how IS_ALIGNED() is wrong, I think the pre-condition might be > exactly 4k past a 2M boundary, not just simply a non-2M boundary. That's the understanding I have gained, yes. > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -5505,7 +5505,7 @@ int map_pages_to_xen( > \ > ASSERT(!mfn_eq(m_, INVALID_MFN)); \ > IS_ALIGNED(PFN_DOWN(v) | mfn_x(m_), \ > - (1UL << (PAGETABLE_ORDER * ((n) - 1))) - 1); \ > + (1UL << (PAGETABLE_ORDER * ((n) - 1)))); \ Can we also get rid of the now unnecessary outermost pair of parentheses of that operand, please? Imo any reduction in parentheses in constructs like this helps readability. Jan
On 19/03/2025 2:29 pm, Jan Beulich wrote: > On 19.03.2025 15:20, Andrew Cooper wrote: >> The current CI failures happen to be a latent bug triggered by a narrow set of >> properties of the initrd, which CI encountered by chance. > Plus properties of the host memory map. > >> One step during boot involves constructing directmap mappings for modules. >> With some probing at the point of creation, it is observed that there's a 4k >> mapping missing towards the end of the initrd. >> >> (XEN) === Mapped Mod1 [0000000394001000, 00000003be1ff6dc] to Directmap >> (XEN) Probing paddr 394001000, va ffff830394001000 >> (XEN) Probing paddr 3be1ff6db, va ffff8303be1ff6db >> (XEN) Probing paddr 3bdffffff, va ffff8303bdffffff >> (XEN) Probing paddr 3be001000, va ffff8303be001000 >> (XEN) Probing paddr 3be000000, va ffff8303be000000 >> (XEN) Early fatal page fault at e008:ffff82d04032014c (cr2=ffff8303be000000, ec=0000) >> >> The conditions for this bug appear to be map_pages_to_xen() call with a non-2M >> aligned start address, some number of full 2M pages, then a tail needing 4k >> pages. >> >> Anyway, the condition for spotting superpage boundaries in map_pages_to_xen() >> is wrong. The IS_ALIGNED() macro expects a power of two for the alignment >> argument, and subtracts 1 itself. >> >> Fixing this causes the failing case to now boot. >> >> Fixes: 97fb6fcf26e8 ("x86/mm: introduce helpers to detect super page alignment") >> Debugged-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com> > >> Judging by how IS_ALIGNED() is wrong, I think the pre-condition might be >> exactly 4k past a 2M boundary, not just simply a non-2M boundary. > That's the understanding I have gained, yes. > >> --- a/xen/arch/x86/mm.c >> +++ b/xen/arch/x86/mm.c >> @@ -5505,7 +5505,7 @@ int map_pages_to_xen( >> \ >> ASSERT(!mfn_eq(m_, INVALID_MFN)); \ >> IS_ALIGNED(PFN_DOWN(v) | mfn_x(m_), \ >> - (1UL << (PAGETABLE_ORDER * ((n) - 1))) - 1); \ >> + (1UL << (PAGETABLE_ORDER * ((n) - 1)))); \ > Can we also get rid of the now unnecessary outermost pair of parentheses > of that operand, please? Imo any reduction in parentheses in constructs > like this helps readability. Ok. I'll adjust all of these, and get the fix in. Thanks. ~Andrew
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c index 03b8319f7a9d..e0793ad3462b 100644 --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -5505,7 +5505,7 @@ int map_pages_to_xen( \ ASSERT(!mfn_eq(m_, INVALID_MFN)); \ IS_ALIGNED(PFN_DOWN(v) | mfn_x(m_), \ - (1UL << (PAGETABLE_ORDER * ((n) - 1))) - 1); \ + (1UL << (PAGETABLE_ORDER * ((n) - 1)))); \ }) #define IS_L2E_ALIGNED(v, m) IS_LnE_ALIGNED(v, m, 2) #define IS_L3E_ALIGNED(v, m) IS_LnE_ALIGNED(v, m, 3)