Message ID | 1668147701-4583-2-git-send-email-mikelley@microsoft.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Drivers: hv: Add PCI pass-thru support to Hyper-V Confidential VMs | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On 11/10/22 22:21, Michael Kelley wrote: > If applying the PHYSICAL_PAGE_MASK to the phys_addr argument causes > upper bits to be masked out, the re-calculation of size to account for > page alignment is incorrect because the same bits are not masked out > in last_addr. > > Fix this by masking the page aligned last_addr as well. This makes sense at first glance. How did you notice this? What is the impact to users? Did the bug actually cause you some trouble or was it by inspection? Do you have a sense of how many folks might be impacted? Any thoughts on how it lasted for 14+ years? For the functionality of the mapping, I guess 'size' doesn't really matter because even a 1-byte 'size' will map a page. The other fallout would be from memtype_reserve() reserving too little. But, that's unlikely to matter for small mappings because even though: ioremap(0x1800, 0x800); would end up just reserving 0x1000->0x1800, it still wouldn't allow ioremap(0x1000, 0x800); to succeed because *both* of them would end up trying to reserve the beginning of the page. Basically, the first caller effectively reserves the whole page and any second user will fail. So the other place it would matter would be for mappings that span two pages, say: ioremap(0x1fff, 0x2) But I guess those aren't very common. Most large ioremap() callers seem to already have base and size page-aligned. Anyway, sorry to make so much of a big deal about a one-liner. But, these decade-old bugs really make me wonder how they stuck around for so long. I'd be curious if you thought about this too while putting together this fox.
From: Dave Hansen <dave.hansen@intel.com> Sent: Friday, November 11, 2022 4:12 PM > > On 11/10/22 22:21, Michael Kelley wrote: > > If applying the PHYSICAL_PAGE_MASK to the phys_addr argument causes > > upper bits to be masked out, the re-calculation of size to account for > > page alignment is incorrect because the same bits are not masked out > > in last_addr. > > > > Fix this by masking the page aligned last_addr as well. > > This makes sense at first glance. > > How did you notice this? What is the impact to users? Did the bug > actually cause you some trouble or was it by inspection? Do you have a > sense of how many folks might be impacted? Any thoughts on how it > lasted for 14+ years? > > For the functionality of the mapping, I guess 'size' doesn't really > matter because even a 1-byte 'size' will map a page. The other fallout > would be from memtype_reserve() reserving too little. But, that's > unlikely to matter for small mappings because even though: > > ioremap(0x1800, 0x800); > > would end up just reserving 0x1000->0x1800, it still wouldn't allow > > ioremap(0x1000, 0x800); > > to succeed because *both* of them would end up trying to reserve the > beginning of the page. Basically, the first caller effectively reserves > the whole page and any second user will fail. > > So the other place it would matter would be for mappings that span two > pages, say: > > ioremap(0x1fff, 0x2) > > But I guess those aren't very common. Most large ioremap() callers seem > to already have base and size page-aligned. > > Anyway, sorry to make so much of a big deal about a one-liner. But, > these decade-old bugs really make me wonder how they stuck around for so > long. > > I'd be curious if you thought about this too while putting together this > fox. The bug only manifests if the phys_addr input argument exceeds PHYSICAL_PAGE_MASK, which is the global variable physical_mask, which is the size of the machine's or VM's physical address space. That's the only case in which masking with PHYSICAL_PAGE_MASK changes anything. So I don't see that your examples fit the situation. In the case where the masking does clear some high order bits, the "size" calculation yields a huge number which then quickly causes an error. With that understanding, I'd guess that over the last 14 years, the bug has never manifested, or if it did, it was due to something badly broken in the caller. It's not clear why masking with PHYSICAL_PAGE_MASK is there in the first place, other than as a "safety check" on the phys_addr input argument that wasn't done quite correctly. I hit the issue because this patch series does a *transition* in how the AMD SNP "vTOM" bit is handled. vTOM is bit 46 in a 47-bit physical address space -- i.e., it's the high order bit. Current code treats the vTOM bit as part of the physical address, and current code passes addresses with vTOM set into __ioremap_caller() and everything works. But Patch 5 of this patch series changes the underlying global variable physical_mask to remove bit 46, similar to what tdx_early_init() does. At that point, passing __ioremap_caller() a phys_addr with the vTOM bit set causes the bug and a failure. With the fix, Patch 5 in this series causes __ioremap_caller() to mask out the vTOM bit, which is what I want, at least temporarily. Later patches in the series change things so that we no longer pass a phys_addr to __ioremap_caller() that has the vTOM bit set. After those later patches, this fix to __ioremap_caller() isn't needed. But I wanted to avoid cramming all the vTOM-related changes into a single huge patch. Having __ioremap_caller() correctly handle a phys_addr that exceeds physical_mask instead of blowing up let's this patch series sequence things into reasonable size chunks. And given that the __ioremap_caller() code is wrong regardless, fixing it seemed like a reasonable overall solution. Michael
On 11/10/22 22:21, Michael Kelley wrote: > --- a/arch/x86/mm/ioremap.c > +++ b/arch/x86/mm/ioremap.c > @@ -218,7 +218,7 @@ static void __ioremap_check_mem(resource_size_t addr, unsigned long size, > */ > offset = phys_addr & ~PAGE_MASK; > phys_addr &= PHYSICAL_PAGE_MASK; > - size = PAGE_ALIGN(last_addr+1) - phys_addr; > + size = (PAGE_ALIGN(last_addr+1) & PHYSICAL_PAGE_MASK) - phys_addr; Michael, thanks for the explanation in your other reply. First and foremost, I *totally* missed the reason for this patch. I was thinking about issues that could pop up from the _lower_ bits being masked off. Granted, your changelog _did_ say "upper bits", so shame on me. But, it would be great to put some more background in the changelog to make it a bit harder for silly reviewers to miss such things. I'd also like to propose something that I think is more straightforward: /* * Mappings have to be page-aligned */ offset = phys_addr & ~PAGE_MASK; phys_addr &= PAGE_MASK; size = PAGE_ALIGN(last_addr+1) - phys_addr; /* * Mask out any bits not parts of the actual physical * address, like memory encryption bits. */ phys_addr &= PHYSICAL_PAGE_MASK; Because, first of all, that "Mappings have to be page-aligned" thing is (now) doing more than page-aligning things. Second, the moment you mask out the metadata bits, the 'size' calculation gets harder. Doing it in two phases (page alignment followed by metadata bit masking) breaks up the two logical operations.
From: Dave Hansen <dave.hansen@intel.com> Sent: Monday, November 14, 2022 8:40 AM > > On 11/10/22 22:21, Michael Kelley wrote: > > --- a/arch/x86/mm/ioremap.c > > +++ b/arch/x86/mm/ioremap.c > > @@ -218,7 +218,7 @@ static void __ioremap_check_mem(resource_size_t addr, > unsigned long size, > > */ > > offset = phys_addr & ~PAGE_MASK; > > phys_addr &= PHYSICAL_PAGE_MASK; > > - size = PAGE_ALIGN(last_addr+1) - phys_addr; > > + size = (PAGE_ALIGN(last_addr+1) & PHYSICAL_PAGE_MASK) - phys_addr; > > Michael, thanks for the explanation in your other reply. First and > foremost, I *totally* missed the reason for this patch. I was thinking > about issues that could pop up from the _lower_ bits being masked off. > > Granted, your changelog _did_ say "upper bits", so shame on me. But, it > would be great to put some more background in the changelog to make it a > bit harder for silly reviewers to miss such things. > > I'd also like to propose something that I think is more straightforward: > > /* > * Mappings have to be page-aligned > */ > offset = phys_addr & ~PAGE_MASK; > phys_addr &= PAGE_MASK; > size = PAGE_ALIGN(last_addr+1) - phys_addr; > > /* > * Mask out any bits not parts of the actual physical > * address, like memory encryption bits. > */ > phys_addr &= PHYSICAL_PAGE_MASK; > > Because, first of all, that "Mappings have to be page-aligned" thing is > (now) doing more than page-aligning things. Second, the moment you mask > out the metadata bits, the 'size' calculation gets harder. Doing it in > two phases (page alignment followed by metadata bit masking) breaks up > the two logical operations. > Work for me. Will do this in v3. Michael
On 11/14/22 08:53, Michael Kelley (LINUX) wrote: >> Because, first of all, that "Mappings have to be page-aligned" thing is >> (now) doing more than page-aligning things. Second, the moment you mask >> out the metadata bits, the 'size' calculation gets harder. Doing it in >> two phases (page alignment followed by metadata bit masking) breaks up >> the two logical operations. >> > Work for me. Will do this in v3. Kirill also made a good point about TDX: it isn't affected by this because it always passes *real* (no metadata bits set) physical addresses in here. Could you double check that you don't want to do the same for your code?
From: Dave Hansen <dave.hansen@intel.com> Sent: Monday, November 14, 2022 8:58 AM > > On 11/14/22 08:53, Michael Kelley (LINUX) wrote: > >> Because, first of all, that "Mappings have to be page-aligned" thing is > >> (now) doing more than page-aligning things. Second, the moment you mask > >> out the metadata bits, the 'size' calculation gets harder. Doing it in > >> two phases (page alignment followed by metadata bit masking) breaks up > >> the two logical operations. > >> > > Work for me. Will do this in v3. > > Kirill also made a good point about TDX: it isn't affected by this > because it always passes *real* (no metadata bits set) physical > addresses in here. Could you double check that you don't want to do the > same for your code? > Yes, we want to do the same for the Hyper-V vTOM code. And when this full patch set is applied, we're only passing in *real* physical addresses and are not depending on __ioremap_caller() doing any masking. But this patch set is executing a transition from current code, which passes physical addresses with metadata bits set (i.e., the vTOM bit), to the new approach, which does not. There are several places in the current Hyper-V vTOM code that need changes to make this transition. These changes are non-trivial, and I don't want to have to cram them all into one big patch. By making this fix, the current code continues to work throughout this patch series while the changes are incrementally made in multiple individual patches. But when it's all done, we won't be passing any physical addresses with the vTOM bit set. Note that current code works and doesn't hit the bug because the global variable physical_mask includes the vTOM bit as part of the physical address. But Patch 5 of the series removes the vTOM bit from physical_mask. At that point, the current __ioremap_caller() code breaks due to the bug. By fixing the bug, the current Hyper-V vTOM code continues to work until all the changes can be completed (which is Patch 10 of the series). Perhaps it's convoluted, but basically I'm trying to avoid having to merge Patches 5 thru 10 into one big patch. And since the current __ioremap_caller() is wrong anyway, fixing it made everything smoother. Michael
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index 78c5bc6..0343de4 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -218,7 +218,7 @@ static void __ioremap_check_mem(resource_size_t addr, unsigned long size, */ offset = phys_addr & ~PAGE_MASK; phys_addr &= PHYSICAL_PAGE_MASK; - size = PAGE_ALIGN(last_addr+1) - phys_addr; + size = (PAGE_ALIGN(last_addr+1) & PHYSICAL_PAGE_MASK) - phys_addr; retval = memtype_reserve(phys_addr, (u64)phys_addr + size, pcm, &new_pcm);
If applying the PHYSICAL_PAGE_MASK to the phys_addr argument causes upper bits to be masked out, the re-calculation of size to account for page alignment is incorrect because the same bits are not masked out in last_addr. Fix this by masking the page aligned last_addr as well. Signed-off-by: Michael Kelley <mikelley@microsoft.com> --- arch/x86/mm/ioremap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)