Message ID | 20240701095505.165383-6-steven.price@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Support for running as a guest in Arm CCA | expand |
On Mon, Jul 01, 2024 at 10:54:55AM +0100, Steven Price wrote: > All I/O is by default considered non-secure for realms. As such > mark them as shared with the host. > > Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > Signed-off-by: Steven Price <steven.price@arm.com> > --- > Changes since v3: > * Add PROT_NS_SHARED to FIXMAP_PAGE_IO rather than overriding > set_fixmap_io() with a custom function. > * Modify ioreamp_cache() to specify PROT_NS_SHARED too. > --- > arch/arm64/include/asm/fixmap.h | 2 +- > arch/arm64/include/asm/io.h | 8 ++++---- > 2 files changed, 5 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h > index 87e307804b99..f2c5e653562e 100644 > --- a/arch/arm64/include/asm/fixmap.h > +++ b/arch/arm64/include/asm/fixmap.h > @@ -98,7 +98,7 @@ enum fixed_addresses { > #define FIXADDR_TOT_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) > #define FIXADDR_TOT_START (FIXADDR_TOP - FIXADDR_TOT_SIZE) > > -#define FIXMAP_PAGE_IO __pgprot(PROT_DEVICE_nGnRE) > +#define FIXMAP_PAGE_IO __pgprot(PROT_DEVICE_nGnRE | PROT_NS_SHARED) > > void __init early_fixmap_init(void); > > diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h > index 4ff0ae3f6d66..07fc1801c6ad 100644 > --- a/arch/arm64/include/asm/io.h > +++ b/arch/arm64/include/asm/io.h > @@ -277,12 +277,12 @@ static inline void __const_iowrite64_copy(void __iomem *to, const void *from, > > #define ioremap_prot ioremap_prot > > -#define _PAGE_IOREMAP PROT_DEVICE_nGnRE > +#define _PAGE_IOREMAP (PROT_DEVICE_nGnRE | PROT_NS_SHARED) > > #define ioremap_wc(addr, size) \ > - ioremap_prot((addr), (size), PROT_NORMAL_NC) > + ioremap_prot((addr), (size), (PROT_NORMAL_NC | PROT_NS_SHARED)) > #define ioremap_np(addr, size) \ > - ioremap_prot((addr), (size), PROT_DEVICE_nGnRnE) > + ioremap_prot((addr), (size), (PROT_DEVICE_nGnRnE | PROT_NS_SHARED)) Hmm. I do wonder whether you've pushed the PROT_NS_SHARED too far here. There's nothing _architecturally_ special about the top address bit. Even if the RSI divides the IPA space in half, the CPU doesn't give two hoots about it in the hardware. In which case, it feels wrong to bake PROT_NS_SHARED into ioremap_prot -- it feels much better to me if the ioremap() code OR'd that into the physical address when passing it down There's a selfish side of that argument, in that we need to hook ioremap() for pKVM protected guests, but I do genuinely feel that treating address bits as protection bits is arbitrary and doesn't belong in these low-level definitions. In a similar vein, AMD has its sme_{set,clr}() macros that operate on the PA (e.g. via dma_to_phys()), which feels like a more accurate abstraction to me. Will
Hi Will On 09/07/2024 12:39, Will Deacon wrote: > On Mon, Jul 01, 2024 at 10:54:55AM +0100, Steven Price wrote: >> All I/O is by default considered non-secure for realms. As such >> mark them as shared with the host. >> >> Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> Signed-off-by: Steven Price <steven.price@arm.com> >> --- >> Changes since v3: >> * Add PROT_NS_SHARED to FIXMAP_PAGE_IO rather than overriding >> set_fixmap_io() with a custom function. >> * Modify ioreamp_cache() to specify PROT_NS_SHARED too. >> --- >> arch/arm64/include/asm/fixmap.h | 2 +- >> arch/arm64/include/asm/io.h | 8 ++++---- >> 2 files changed, 5 insertions(+), 5 deletions(-) >> >> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h >> index 87e307804b99..f2c5e653562e 100644 >> --- a/arch/arm64/include/asm/fixmap.h >> +++ b/arch/arm64/include/asm/fixmap.h >> @@ -98,7 +98,7 @@ enum fixed_addresses { >> #define FIXADDR_TOT_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) >> #define FIXADDR_TOT_START (FIXADDR_TOP - FIXADDR_TOT_SIZE) >> >> -#define FIXMAP_PAGE_IO __pgprot(PROT_DEVICE_nGnRE) >> +#define FIXMAP_PAGE_IO __pgprot(PROT_DEVICE_nGnRE | PROT_NS_SHARED) >> >> void __init early_fixmap_init(void); >> >> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h >> index 4ff0ae3f6d66..07fc1801c6ad 100644 >> --- a/arch/arm64/include/asm/io.h >> +++ b/arch/arm64/include/asm/io.h >> @@ -277,12 +277,12 @@ static inline void __const_iowrite64_copy(void __iomem *to, const void *from, >> >> #define ioremap_prot ioremap_prot >> >> -#define _PAGE_IOREMAP PROT_DEVICE_nGnRE >> +#define _PAGE_IOREMAP (PROT_DEVICE_nGnRE | PROT_NS_SHARED) >> >> #define ioremap_wc(addr, size) \ >> - ioremap_prot((addr), (size), PROT_NORMAL_NC) >> + ioremap_prot((addr), (size), (PROT_NORMAL_NC | PROT_NS_SHARED)) >> #define ioremap_np(addr, size) \ >> - ioremap_prot((addr), (size), PROT_DEVICE_nGnRnE) >> + ioremap_prot((addr), (size), (PROT_DEVICE_nGnRnE | PROT_NS_SHARED)) > > Hmm. I do wonder whether you've pushed the PROT_NS_SHARED too far here. > > There's nothing _architecturally_ special about the top address bit. > Even if the RSI divides the IPA space in half, the CPU doesn't give two > hoots about it in the hardware. In which case, it feels wrong to bake > PROT_NS_SHARED into ioremap_prot -- it feels much better to me if the > ioremap() code OR'd that into the physical address when passing it down Actually we would like to push the decision of applying the "pgprot_decrypted" vs pgprot_encrypted into ioremap_prot(), rather than sprinkling every user of ioremap_prot(). This could be made depending on the address that is passed on to the ioremap_prot(). I guess we would need explicit requests from the callers to add "encrypted vs decrypted". Is that what you guys are looking at ? > > There's a selfish side of that argument, in that we need to hook > ioremap() for pKVM protected guests, but I do genuinely feel that > treating address bits as protection bits is arbitrary and doesn't belong > in these low-level definitions. In a similar vein, AMD has its > sme_{set,clr}() macros that operate on the PA (e.g. via dma_to_phys()), > which feels like a more accurate abstraction to me. I believe that doesn't solve all the problems. They do have a hook in __ioremap_caller() that implicitly applies pgprot_{en,de}crypted depending on other info. Cheers Suzuki > > Will
On 09/07/2024 13:54, Suzuki K Poulose wrote: > Hi Will > > On 09/07/2024 12:39, Will Deacon wrote: >> On Mon, Jul 01, 2024 at 10:54:55AM +0100, Steven Price wrote: >>> All I/O is by default considered non-secure for realms. As such >>> mark them as shared with the host. >>> >>> Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com> >>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >>> Signed-off-by: Steven Price <steven.price@arm.com> >>> --- >>> Changes since v3: >>> * Add PROT_NS_SHARED to FIXMAP_PAGE_IO rather than overriding >>> set_fixmap_io() with a custom function. >>> * Modify ioreamp_cache() to specify PROT_NS_SHARED too. >>> --- >>> arch/arm64/include/asm/fixmap.h | 2 +- >>> arch/arm64/include/asm/io.h | 8 ++++---- >>> 2 files changed, 5 insertions(+), 5 deletions(-) >>> >>> diff --git a/arch/arm64/include/asm/fixmap.h >>> b/arch/arm64/include/asm/fixmap.h >>> index 87e307804b99..f2c5e653562e 100644 >>> --- a/arch/arm64/include/asm/fixmap.h >>> +++ b/arch/arm64/include/asm/fixmap.h >>> @@ -98,7 +98,7 @@ enum fixed_addresses { >>> #define FIXADDR_TOT_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) >>> #define FIXADDR_TOT_START (FIXADDR_TOP - FIXADDR_TOT_SIZE) >>> -#define FIXMAP_PAGE_IO __pgprot(PROT_DEVICE_nGnRE) >>> +#define FIXMAP_PAGE_IO __pgprot(PROT_DEVICE_nGnRE | PROT_NS_SHARED) >>> void __init early_fixmap_init(void); >>> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h >>> index 4ff0ae3f6d66..07fc1801c6ad 100644 >>> --- a/arch/arm64/include/asm/io.h >>> +++ b/arch/arm64/include/asm/io.h >>> @@ -277,12 +277,12 @@ static inline void __const_iowrite64_copy(void >>> __iomem *to, const void *from, >>> #define ioremap_prot ioremap_prot >>> -#define _PAGE_IOREMAP PROT_DEVICE_nGnRE >>> +#define _PAGE_IOREMAP (PROT_DEVICE_nGnRE | PROT_NS_SHARED) >>> #define ioremap_wc(addr, size) \ >>> - ioremap_prot((addr), (size), PROT_NORMAL_NC) >>> + ioremap_prot((addr), (size), (PROT_NORMAL_NC | PROT_NS_SHARED)) >>> #define ioremap_np(addr, size) \ >>> - ioremap_prot((addr), (size), PROT_DEVICE_nGnRnE) >>> + ioremap_prot((addr), (size), (PROT_DEVICE_nGnRnE | PROT_NS_SHARED)) >> >> Hmm. I do wonder whether you've pushed the PROT_NS_SHARED too far here. >> >> There's nothing _architecturally_ special about the top address bit. >> Even if the RSI divides the IPA space in half, the CPU doesn't give two >> hoots about it in the hardware. In which case, it feels wrong to bake >> PROT_NS_SHARED into ioremap_prot -- it feels much better to me if the >> ioremap() code OR'd that into the physical address when passing it down This is really just a simplification given we don't (yet) have device assignment. > Actually we would like to push the decision of applying the > "pgprot_decrypted" vs pgprot_encrypted into ioremap_prot(), rather > than sprinkling every user of ioremap_prot(). > > This could be made depending on the address that is passed on to the > ioremap_prot(). I guess we would need explicit requests from the callers > to add "encrypted vs decrypted". Is that what you guys are looking at ? There's a missing piece at the moment in terms of how the guest is going to identify whether a particular device is protected or shared (i.e. a real assigned device, or emulated by the VMM). When that's added then I was expecting ioremap_prot() to provide that flag based on discovering whether the address range passed in is for an assigned device or not. >> >> There's a selfish side of that argument, in that we need to hook >> ioremap() for pKVM protected guests, but I do genuinely feel that >> treating address bits as protection bits is arbitrary and doesn't belong >> in these low-level definitions. In a similar vein, AMD has its I'd be interested to see how pKVM will handle both protected and emulated (by the VMM) devices. Although we have the 'top bit' flag it's actually a pain to pass that down to the guest as a flag to use for this purpose (e.g. 32 bit PCI BARs are too small). So our current thought is an out-of-band request to identify whether a particular address corresponds to a protected device or not. We'd then set the top bit appropriately. >> sme_{set,clr}() macros that operate on the PA (e.g. via dma_to_phys()), >> which feels like a more accurate abstraction to me. > > I believe that doesn't solve all the problems. They do have a hook in > __ioremap_caller() that implicitly applies pgprot_{en,de}crypted > depending on other info. This is the other option - which pushes the knowledge down to the individual drivers to decide whether a region is 'encrypted' (i.e. protected) or not. It's more flexible, but potentially requires 'fixing' many drivers to understand this. Thanks, Steve > Cheers > Suzuki > >> >> Will >
On 7/1/24 7:54 PM, Steven Price wrote: > All I/O is by default considered non-secure for realms. As such > mark them as shared with the host. > > Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com> > Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> > Signed-off-by: Steven Price <steven.price@arm.com> > --- > Changes since v3: > * Add PROT_NS_SHARED to FIXMAP_PAGE_IO rather than overriding > set_fixmap_io() with a custom function. > * Modify ioreamp_cache() to specify PROT_NS_SHARED too. > --- > arch/arm64/include/asm/fixmap.h | 2 +- > arch/arm64/include/asm/io.h | 8 ++++---- > 2 files changed, 5 insertions(+), 5 deletions(-) > I'm unable to understand this. Steven, could you please explain a bit how PROT_NS_SHARED is turned to a shared (non-secure) mapping to hardware? According to tf-rmm's implementation in tf-rmm/lib/s2tt/src/s2tt_pvt_defs.h, a shared (non-secure) mapping is is identified by NS bit (bit#55). I find difficulties how the NS bit is correlate with PROT_NS_SHARED. For example, how the NS bit is set based on PROT_NS_SHARED. > diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h > index 87e307804b99..f2c5e653562e 100644 > --- a/arch/arm64/include/asm/fixmap.h > +++ b/arch/arm64/include/asm/fixmap.h > @@ -98,7 +98,7 @@ enum fixed_addresses { > #define FIXADDR_TOT_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) > #define FIXADDR_TOT_START (FIXADDR_TOP - FIXADDR_TOT_SIZE) > > -#define FIXMAP_PAGE_IO __pgprot(PROT_DEVICE_nGnRE) > +#define FIXMAP_PAGE_IO __pgprot(PROT_DEVICE_nGnRE | PROT_NS_SHARED) > > void __init early_fixmap_init(void); > > diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h > index 4ff0ae3f6d66..07fc1801c6ad 100644 > --- a/arch/arm64/include/asm/io.h > +++ b/arch/arm64/include/asm/io.h > @@ -277,12 +277,12 @@ static inline void __const_iowrite64_copy(void __iomem *to, const void *from, > > #define ioremap_prot ioremap_prot > > -#define _PAGE_IOREMAP PROT_DEVICE_nGnRE > +#define _PAGE_IOREMAP (PROT_DEVICE_nGnRE | PROT_NS_SHARED) > > #define ioremap_wc(addr, size) \ > - ioremap_prot((addr), (size), PROT_NORMAL_NC) > + ioremap_prot((addr), (size), (PROT_NORMAL_NC | PROT_NS_SHARED)) > #define ioremap_np(addr, size) \ > - ioremap_prot((addr), (size), PROT_DEVICE_nGnRnE) > + ioremap_prot((addr), (size), (PROT_DEVICE_nGnRnE | PROT_NS_SHARED)) > > /* > * io{read,write}{16,32,64}be() macros > @@ -303,7 +303,7 @@ static inline void __iomem *ioremap_cache(phys_addr_t addr, size_t size) > if (pfn_is_map_memory(__phys_to_pfn(addr))) > return (void __iomem *)__phys_to_virt(addr); > > - return ioremap_prot(addr, size, PROT_NORMAL); > + return ioremap_prot(addr, size, PROT_NORMAL | PROT_NS_SHARED); > } > > /* Thanks, Gavin
Hi Gavin, Thanks for looking at the patch. Responses inline. On 30/07/2024 02:36, Gavin Shan wrote: > On 7/1/24 7:54 PM, Steven Price wrote: >> All I/O is by default considered non-secure for realms. As such >> mark them as shared with the host. >> >> Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> >> Signed-off-by: Steven Price <steven.price@arm.com> >> --- >> Changes since v3: >> * Add PROT_NS_SHARED to FIXMAP_PAGE_IO rather than overriding >> set_fixmap_io() with a custom function. >> * Modify ioreamp_cache() to specify PROT_NS_SHARED too. >> --- >> arch/arm64/include/asm/fixmap.h | 2 +- >> arch/arm64/include/asm/io.h | 8 ++++---- >> 2 files changed, 5 insertions(+), 5 deletions(-) >> > > I'm unable to understand this. Steven, could you please explain a bit how > PROT_NS_SHARED is turned to a shared (non-secure) mapping to hardware? > According to tf-rmm's implementation in > tf-rmm/lib/s2tt/src/s2tt_pvt_defs.h, > a shared (non-secure) mapping is is identified by NS bit (bit#55). I find > difficulties how the NS bit is correlate with PROT_NS_SHARED. For example, > how the NS bit is set based on PROT_NS_SHARED. There are two things at play here : 1. Stage1 mapping controlled by the Realm (Linux in this case, as above). 2. Stage2 mapping controlled by the RMM (with RMI commands from NS Host). Also : The Realm's IPA space is divided into two halves (decided by the IPA Width of the Realm, not the NSbit #55), protected (Lower half) and Unprotected (Upper half). All stage2 mappings of the "Unprotected IPA" will have the NS bit (#55) set by the RMM. By design, any MMIO access to an unprotected half is sent to the NS Host by RMM and any page the Realm wants to share with the Host must be in the Upper half of the IPA. What we do above is controlling the "Stage1" used by the Linux. i.e, for a given VA, we flip the Guest "PA" (in reality IPA) to the "Unprotected" alias. e.g., DTB describes a UART at address 0x10_0000 to Realm (with an IPA width of 40, like in the normal VM case), emulated by the host. Realm is trying to map this I/O address into Stage1 at VA. So we apply the BIT(39) as PROT_NS_SHARED while creating the Stage1 mapping. ie., VA == stage1 ==> BIT(39) | 0x10_0000 =(IPA)== > 0x80_10_0000 Now, the Stage2 mapping won't be present for this IPA if it is emulated and thus an access to "VA" causes a Stage2 Abort to the Host, which the RMM allows the host to emulate. Otherwise a shared page would have been mapped by the Host (and NS bit set at Stage2 by RMM), allowing the data to be shared with the host. Does that answer your question ? Suzuki > >> diff --git a/arch/arm64/include/asm/fixmap.h >> b/arch/arm64/include/asm/fixmap.h >> index 87e307804b99..f2c5e653562e 100644 >> --- a/arch/arm64/include/asm/fixmap.h >> +++ b/arch/arm64/include/asm/fixmap.h >> @@ -98,7 +98,7 @@ enum fixed_addresses { >> #define FIXADDR_TOT_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) >> #define FIXADDR_TOT_START (FIXADDR_TOP - FIXADDR_TOT_SIZE) >> -#define FIXMAP_PAGE_IO __pgprot(PROT_DEVICE_nGnRE) >> +#define FIXMAP_PAGE_IO __pgprot(PROT_DEVICE_nGnRE | PROT_NS_SHARED) >> void __init early_fixmap_init(void); >> diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h >> index 4ff0ae3f6d66..07fc1801c6ad 100644 >> --- a/arch/arm64/include/asm/io.h >> +++ b/arch/arm64/include/asm/io.h >> @@ -277,12 +277,12 @@ static inline void __const_iowrite64_copy(void >> __iomem *to, const void *from, >> #define ioremap_prot ioremap_prot >> -#define _PAGE_IOREMAP PROT_DEVICE_nGnRE >> +#define _PAGE_IOREMAP (PROT_DEVICE_nGnRE | PROT_NS_SHARED) >> #define ioremap_wc(addr, size) \ >> - ioremap_prot((addr), (size), PROT_NORMAL_NC) >> + ioremap_prot((addr), (size), (PROT_NORMAL_NC | PROT_NS_SHARED)) >> #define ioremap_np(addr, size) \ >> - ioremap_prot((addr), (size), PROT_DEVICE_nGnRnE) >> + ioremap_prot((addr), (size), (PROT_DEVICE_nGnRnE | PROT_NS_SHARED)) >> /* >> * io{read,write}{16,32,64}be() macros >> @@ -303,7 +303,7 @@ static inline void __iomem >> *ioremap_cache(phys_addr_t addr, size_t size) >> if (pfn_is_map_memory(__phys_to_pfn(addr))) >> return (void __iomem *)__phys_to_virt(addr); >> - return ioremap_prot(addr, size, PROT_NORMAL); >> + return ioremap_prot(addr, size, PROT_NORMAL | PROT_NS_SHARED); >> } >> /* > > Thanks, > Gavin >
Hi Suzuki, On 7/30/24 8:36 PM, Suzuki K Poulose wrote: > On 30/07/2024 02:36, Gavin Shan wrote: >> On 7/1/24 7:54 PM, Steven Price wrote: >> I'm unable to understand this. Steven, could you please explain a bit how >> PROT_NS_SHARED is turned to a shared (non-secure) mapping to hardware? >> According to tf-rmm's implementation in tf-rmm/lib/s2tt/src/s2tt_pvt_defs.h, >> a shared (non-secure) mapping is is identified by NS bit (bit#55). I find >> difficulties how the NS bit is correlate with PROT_NS_SHARED. For example, >> how the NS bit is set based on PROT_NS_SHARED. > > > There are two things at play here : > > 1. Stage1 mapping controlled by the Realm (Linux in this case, as above). > 2. Stage2 mapping controlled by the RMM (with RMI commands from NS Host). > > Also : > The Realm's IPA space is divided into two halves (decided by the IPA Width of the Realm, not the NSbit #55), protected (Lower half) and > Unprotected (Upper half). All stage2 mappings of the "Unprotected IPA" > will have the NS bit (#55) set by the RMM. By design, any MMIO access > to an unprotected half is sent to the NS Host by RMM and any page > the Realm wants to share with the Host must be in the Upper half > of the IPA. > > What we do above is controlling the "Stage1" used by the Linux. i.e, > for a given VA, we flip the Guest "PA" (in reality IPA) to the > "Unprotected" alias. > > e.g., DTB describes a UART at address 0x10_0000 to Realm (with an IPA width of 40, like in the normal VM case), emulated by the host. Realm is > trying to map this I/O address into Stage1 at VA. So we apply the > BIT(39) as PROT_NS_SHARED while creating the Stage1 mapping. > > ie., VA == stage1 ==> BIT(39) | 0x10_0000 =(IPA)== > 0x80_10_0000 > 0x8000_10_0000 > Now, the Stage2 mapping won't be present for this IPA if it is emulated > and thus an access to "VA" causes a Stage2 Abort to the Host, which the > RMM allows the host to emulate. Otherwise a shared page would have been > mapped by the Host (and NS bit set at Stage2 by RMM), allowing the > data to be shared with the host. > Thank you for the explanation and details. It really helps to understand how the access fault to the unprotected space (upper half) is routed to NS host, and then VMM (QEMU) for emulation. If the commit log can be improved with those information, it will make reader easier to understand the code. I had the following call trace and it seems the address 0x8000_10_1000 is converted to 0x10_0000 in [1], based on current code base (branch: cca-full/v3). At [1], the GPA is masked with kvm_gpa_stolen_bits() so that BIT#39 is removed in this particular case. kvm_vcpu_ioctl(KVM_RUN) // non-secured host kvm_arch_vcpu_ioctl_run kvm_rec_enter rmi_rec_enter // -> SMC_RMI_REC_ENTER : rmm_handler // tf-rmm handle_ns_smc smc_rec_enter rec_run_loop run_realm : el2_vectors el2_sync_lel realm_exit : handle_realm_exit handle_exception_sync handle_data_abort : handle_rme_exit // non-secured host rec_exit_sync_dabt kvm_handle_guest_abort // -> [1] gfn_to_memslot io_mem_abort kvm_io_bus_write // -> run->exit_reason = KVM_EXIT_MMIO Another question is how the Granule Protection Check (GPC) table is updated so that the corresponding granule (0x8000_10_1000) to is accessible by NS host? I mean how the BIT#39 is synchronized to GPC table and translated to the property "granule is accessible by NS host". Thanks, Gavin
On 31/07/2024 07:36, Gavin Shan wrote: > Hi Suzuki, > > On 7/30/24 8:36 PM, Suzuki K Poulose wrote: >> On 30/07/2024 02:36, Gavin Shan wrote: >>> On 7/1/24 7:54 PM, Steven Price wrote: >>> I'm unable to understand this. Steven, could you please explain a bit >>> how >>> PROT_NS_SHARED is turned to a shared (non-secure) mapping to hardware? >>> According to tf-rmm's implementation in >>> tf-rmm/lib/s2tt/src/s2tt_pvt_defs.h, >>> a shared (non-secure) mapping is is identified by NS bit (bit#55). I >>> find >>> difficulties how the NS bit is correlate with PROT_NS_SHARED. For >>> example, >>> how the NS bit is set based on PROT_NS_SHARED. >> >> >> There are two things at play here : >> >> 1. Stage1 mapping controlled by the Realm (Linux in this case, as above). >> 2. Stage2 mapping controlled by the RMM (with RMI commands from NS Host). >> >> Also : >> The Realm's IPA space is divided into two halves (decided by the IPA >> Width of the Realm, not the NSbit #55), protected (Lower half) and >> Unprotected (Upper half). All stage2 mappings of the "Unprotected IPA" >> will have the NS bit (#55) set by the RMM. By design, any MMIO access >> to an unprotected half is sent to the NS Host by RMM and any page >> the Realm wants to share with the Host must be in the Upper half >> of the IPA. >> >> What we do above is controlling the "Stage1" used by the Linux. i.e, >> for a given VA, we flip the Guest "PA" (in reality IPA) to the >> "Unprotected" alias. >> >> e.g., DTB describes a UART at address 0x10_0000 to Realm (with an IPA >> width of 40, like in the normal VM case), emulated by the host. Realm is >> trying to map this I/O address into Stage1 at VA. So we apply the >> BIT(39) as PROT_NS_SHARED while creating the Stage1 mapping. >> >> ie., VA == stage1 ==> BIT(39) | 0x10_0000 =(IPA)== > 0x80_10_0000 >> > 0x8000_10_0000 Yep, my bad. > >> Now, the Stage2 mapping won't be present for this IPA if it is emulated >> and thus an access to "VA" causes a Stage2 Abort to the Host, which the >> RMM allows the host to emulate. Otherwise a shared page would have been >> mapped by the Host (and NS bit set at Stage2 by RMM), allowing the >> data to be shared with the host. >> > > Thank you for the explanation and details. It really helps to understand > how the access fault to the unprotected space (upper half) is routed to NS > host, and then VMM (QEMU) for emulation. If the commit log can be improved > with those information, it will make reader easier to understand the code. > > I had the following call trace and it seems the address 0x8000_10_1000 is > converted to 0x10_0000 in [1], based on current code base (branch: > cca-full/v3). > At [1], the GPA is masked with kvm_gpa_stolen_bits() so that BIT#39 is > removed > in this particular case. > > kvm_vcpu_ioctl(KVM_RUN) // non-secured host > kvm_arch_vcpu_ioctl_run > kvm_rec_enter > rmi_rec_enter // -> SMC_RMI_REC_ENTER > : > rmm_handler // tf-rmm > handle_ns_smc > smc_rec_enter > rec_run_loop > run_realm > : > el2_vectors > el2_sync_lel > realm_exit > : > handle_realm_exit > handle_exception_sync > handle_data_abort > : > handle_rme_exit // non-secured host > rec_exit_sync_dabt > kvm_handle_guest_abort // -> [1] Correct. KVM deals with "GFN" and as such masks the "protection" bit, as the IPA is split. > gfn_to_memslot > io_mem_abort > kvm_io_bus_write // -> > run->exit_reason = KVM_EXIT_MMIO > > Another question is how the Granule Protection Check (GPC) table is > updated so > that the corresponding granule (0x8000_10_1000) to is accessible by NS > host? I > mean how the BIT#39 is synchronized to GPC table and translated to the > property > "granule is accessible by NS host". Good question. GPC is only applicable for memory accesses that are actually "made" (think of this as Stage3). In this case, the Stage2 walk causes an abort and as such the address is never "accessed" (like in the normal VM) and host handles it. In case of a "shared" memory page, the "stage2" mapping created *could* (RMM doesn't guarantee what gets mapped on the Unprotected alias) be mapped to a Granule in the NS PAS (normal world page), via RMI_RTT_MAP_UNPROTECTED. RMM only guarantees that the output of the Stage2 translation is "NS" PAS (the actual PAS of the Granule may not be NS, e.g. if we have a malicious host). Now, converting a "protected IPA" to "shared" (even though we don't share the same Physical page for the aliases with guest_mem, but CCA doesn't prevent this) would take the following route : Realm: IPA_STATE_SET(ipa, RIPAS_EMPTY)-> REC Exit -> Host Reclaims the "PA" backing "IPA" via RMI_DATA_DESTROY -> Change PAS to Realm (RMI_GRANULE_UNDELEGATE) Realm: Access the BIT(39)|ipa => Stage2 Fault -> Host maps "BIT(39)|ipa" vai RMI_RTT_MAP_UNPROTECTED. The important things to remember: 1) "NS_PROT" is just a way to access the "Aliased IPA" in the UNPROTECTED half and is only a "Stage1" change. 2) It doesn't *change the PAS* of the backing PA implicitly 3) It doesn't change the PAS of the resulting "Translation" at Stage2, instead it targets a "different IPA"->PA translation and the resulting *access* is guaranteed to be NS PAS. Suzuki > Thanks, > Gavin > > > > >
diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h index 87e307804b99..f2c5e653562e 100644 --- a/arch/arm64/include/asm/fixmap.h +++ b/arch/arm64/include/asm/fixmap.h @@ -98,7 +98,7 @@ enum fixed_addresses { #define FIXADDR_TOT_SIZE (__end_of_fixed_addresses << PAGE_SHIFT) #define FIXADDR_TOT_START (FIXADDR_TOP - FIXADDR_TOT_SIZE) -#define FIXMAP_PAGE_IO __pgprot(PROT_DEVICE_nGnRE) +#define FIXMAP_PAGE_IO __pgprot(PROT_DEVICE_nGnRE | PROT_NS_SHARED) void __init early_fixmap_init(void); diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h index 4ff0ae3f6d66..07fc1801c6ad 100644 --- a/arch/arm64/include/asm/io.h +++ b/arch/arm64/include/asm/io.h @@ -277,12 +277,12 @@ static inline void __const_iowrite64_copy(void __iomem *to, const void *from, #define ioremap_prot ioremap_prot -#define _PAGE_IOREMAP PROT_DEVICE_nGnRE +#define _PAGE_IOREMAP (PROT_DEVICE_nGnRE | PROT_NS_SHARED) #define ioremap_wc(addr, size) \ - ioremap_prot((addr), (size), PROT_NORMAL_NC) + ioremap_prot((addr), (size), (PROT_NORMAL_NC | PROT_NS_SHARED)) #define ioremap_np(addr, size) \ - ioremap_prot((addr), (size), PROT_DEVICE_nGnRnE) + ioremap_prot((addr), (size), (PROT_DEVICE_nGnRnE | PROT_NS_SHARED)) /* * io{read,write}{16,32,64}be() macros @@ -303,7 +303,7 @@ static inline void __iomem *ioremap_cache(phys_addr_t addr, size_t size) if (pfn_is_map_memory(__phys_to_pfn(addr))) return (void __iomem *)__phys_to_virt(addr); - return ioremap_prot(addr, size, PROT_NORMAL); + return ioremap_prot(addr, size, PROT_NORMAL | PROT_NS_SHARED); } /*