Message ID | 20241004180405.555194-1-yang@os.amperecomputing.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] iommu/arm-smmu-v3: Fix L1 stream table index calculation for 32-bit sid size | expand |
On Fri, Oct 4, 2024 at 11:04 AM Yang Shi <yang@os.amperecomputing.com> wrote: > static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) > { > - u32 size; > + u64 size; > struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; > + u64 num_sids = arm_smmu_strtab_num_sids(smmu); > + > + size = num_sids * sizeof(struct arm_smmu_ste); > + /* The max size for dmam_alloc_coherent() is 32-bit */ I'd remove this comment. I assume the intent here was to say that the maximum size is 4GB (not 32 bit). I also can't find any reference to this limitation. Where does dmam_alloc_coherent() limit the size of an allocation to 4GB? Also, this comment might not be applicable to 64 bit platforms. > + if (size > SIZE_MAX) > + return -EINVAL; I'm assuming this is for platforms where the range of a u64 is larger than that of a size_t type? If we're printing an error message if an allocation fails (i.e. "failed to allocate linear stream table (%llu bytes)\n"), then we might also want to print an error message here. > - cfg->linear.num_ents = 1 << smmu->sid_bits; > + cfg->linear.num_ents = num_sids; If you're worried about 32 bit platforms, then I'm wondering if this also needs some attention. cfg->linear.num_ents is defined as an unsigned int and num_sids could potentially be outside the range of an unsigned int on 32 bit platforms. > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > index 1e9952ca989f..c8ceddc5e8ef 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > @@ -853,6 +853,11 @@ struct arm_smmu_master_domain { > ioasid_t ssid; > }; > > +static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu) > +{ > + return (1ULL << smmu->sid_bits); > +} > + I'm wondering if it makes sense to move this up and put it right before arm_smmu_strtab_l1_idx(). That way, all the arm_smmu_strtab_* functions are in one place. On a related note, in arm_smmu_init_strtab_2lvl() we're capping the number of l1 entries at STRTAB_MAX_L1_ENTRIES for 2 level stream tables. I'm thinking it would make sense to limit the size of linear stream tables for the same reasons.
On 10/4/24 2:14 PM, Daniel Mentz wrote: > On Fri, Oct 4, 2024 at 11:04 AM Yang Shi <yang@os.amperecomputing.com> wrote: >> static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) >> { >> - u32 size; >> + u64 size; >> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; >> + u64 num_sids = arm_smmu_strtab_num_sids(smmu); >> + >> + size = num_sids * sizeof(struct arm_smmu_ste); >> + /* The max size for dmam_alloc_coherent() is 32-bit */ > I'd remove this comment. I assume the intent here was to say that the > maximum size is 4GB (not 32 bit). I also can't find any reference to > this limitation. Where does dmam_alloc_coherent() limit the size of an > allocation to 4GB? Also, this comment might not be applicable to 64 > bit platforms. The "size" parameter passed to dmam_alloc_coherent() is size_t type which is unsigned int. > >> + if (size > SIZE_MAX) >> + return -EINVAL; > I'm assuming this is for platforms where the range of a u64 is larger > than that of a size_t type? If we're printing an error message if an > allocation fails (i.e. "failed to allocate linear stream table (%llu > bytes)\n"), then we might also want to print an error message here. Thanks for the suggestion. Yes, we can if it really helps. > >> - cfg->linear.num_ents = 1 << smmu->sid_bits; >> + cfg->linear.num_ents = num_sids; > If you're worried about 32 bit platforms, then I'm wondering if this > also needs some attention. cfg->linear.num_ents is defined as an > unsigned int and num_sids could potentially be outside the range of an > unsigned int on 32 bit platforms. The (size > SIZE_MAX) check can guarantee excessively large num_sids won't reach here. > >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> index 1e9952ca989f..c8ceddc5e8ef 100644 >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >> @@ -853,6 +853,11 @@ struct arm_smmu_master_domain { >> ioasid_t ssid; >> }; >> >> +static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu) >> +{ >> + return (1ULL << smmu->sid_bits); >> +} >> + > I'm wondering if it makes sense to move this up and put it right > before arm_smmu_strtab_l1_idx(). That way, all the arm_smmu_strtab_* > functions are in one place. I did it. But the function uses struct arm_smmu_device which is defined after those arm_smmu_strtab_* helpers. I have to put the helper after struct arm_smmu_device definition to avoid compile error. We may consider re-organize the header file to group them better, but I don't think it is urgent enough and it seems out of the scope of the bug fix patch. I really want to have the bug fix landed in upstream ASAP. > > On a related note, in arm_smmu_init_strtab_2lvl() we're capping the > number of l1 entries at STRTAB_MAX_L1_ENTRIES for 2 level stream > tables. I'm thinking it would make sense to limit the size of linear > stream tables for the same reasons. Yes, this also works. But I don't know what value should be used. Jason actually suggested (size > SIZE_512M) in v2 review, but I thought the value is a magic number. Why 512M? Just because it is too large for allocation. So I picked up SIZE_MAX, just because it is the largest size supported by size_t type.
On Fri, Oct 4, 2024 at 2:47 PM Yang Shi <yang@os.amperecomputing.com> wrote: > On 10/4/24 2:14 PM, Daniel Mentz wrote: > > On Fri, Oct 4, 2024 at 11:04 AM Yang Shi <yang@os.amperecomputing.com> wrote: > >> static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) > >> { > >> - u32 size; > >> + u64 size; > >> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; > >> + u64 num_sids = arm_smmu_strtab_num_sids(smmu); > >> + > >> + size = num_sids * sizeof(struct arm_smmu_ste); > >> + /* The max size for dmam_alloc_coherent() is 32-bit */ > > I'd remove this comment. I assume the intent here was to say that the > > maximum size is 4GB (not 32 bit). I also can't find any reference to > > this limitation. Where does dmam_alloc_coherent() limit the size of an > > allocation to 4GB? Also, this comment might not be applicable to 64 > > bit platforms. > > The "size" parameter passed to dmam_alloc_coherent() is size_t type > which is unsigned int. I believe that this is true only for 32 bit platforms. On arm64, unsigned int is 32 bit, whereas size_t is 64 bit. I'm still in favor of removing that comment, because it's not applicable to arm64. > > > >> - cfg->linear.num_ents = 1 << smmu->sid_bits; > >> + cfg->linear.num_ents = num_sids; > > If you're worried about 32 bit platforms, then I'm wondering if this > > also needs some attention. cfg->linear.num_ents is defined as an > > unsigned int and num_sids could potentially be outside the range of an > > unsigned int on 32 bit platforms. > > The (size > SIZE_MAX) check can guarantee excessively large num_sids > won't reach here. Now that I think about it, unsigned int is 32 bit even on arm64. So, I'm afraid this could (theoretically) overflow. On arm64, I don't think that the (size > SIZE_MAX) check will prevent this. > >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > >> index 1e9952ca989f..c8ceddc5e8ef 100644 > >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h > >> @@ -853,6 +853,11 @@ struct arm_smmu_master_domain { > >> ioasid_t ssid; > >> }; > >> > >> +static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu) > >> +{ > >> + return (1ULL << smmu->sid_bits); > >> +} > >> + > > I'm wondering if it makes sense to move this up and put it right > > before arm_smmu_strtab_l1_idx(). That way, all the arm_smmu_strtab_* > > functions are in one place. > > I did it. But the function uses struct arm_smmu_device which is defined > after those arm_smmu_strtab_* helpers. I have to put the helper after > struct arm_smmu_device definition to avoid compile error. We may > consider re-organize the header file to group them better, but I don't > think it is urgent enough and it seems out of the scope of the bug fix > patch. I really want to have the bug fix landed in upstream ASAP. Understood. Thanks. We could move the changes in arm_smmu_init_strtab_linear() into a separate patch to accelerate the process. I'm fine either way, though. I don't want to get in the way of this landing upstream. > > > > > On a related note, in arm_smmu_init_strtab_2lvl() we're capping the > > number of l1 entries at STRTAB_MAX_L1_ENTRIES for 2 level stream > > tables. I'm thinking it would make sense to limit the size of linear > > stream tables for the same reasons. > > Yes, this also works. But I don't know what value should be used. Jason > actually suggested (size > SIZE_512M) in v2 review, but I thought the > value is a magic number. Why 512M? Just because it is too large for > allocation. So I picked up SIZE_MAX, just because it is the largest size > supported by size_t type. I think it should be capped to STRTAB_MAX_L1_ENTRIES
On 10/4/24 6:03 PM, Daniel Mentz wrote: > On Fri, Oct 4, 2024 at 2:47 PM Yang Shi <yang@os.amperecomputing.com> wrote: >> On 10/4/24 2:14 PM, Daniel Mentz wrote: >>> On Fri, Oct 4, 2024 at 11:04 AM Yang Shi <yang@os.amperecomputing.com> wrote: >>>> static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) >>>> { >>>> - u32 size; >>>> + u64 size; >>>> struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; >>>> + u64 num_sids = arm_smmu_strtab_num_sids(smmu); >>>> + >>>> + size = num_sids * sizeof(struct arm_smmu_ste); >>>> + /* The max size for dmam_alloc_coherent() is 32-bit */ >>> I'd remove this comment. I assume the intent here was to say that the >>> maximum size is 4GB (not 32 bit). I also can't find any reference to >>> this limitation. Where does dmam_alloc_coherent() limit the size of an >>> allocation to 4GB? Also, this comment might not be applicable to 64 >>> bit platforms. >> The "size" parameter passed to dmam_alloc_coherent() is size_t type >> which is unsigned int. > I believe that this is true only for 32 bit platforms. On arm64, > unsigned int is 32 bit, whereas size_t is 64 bit. I'm still in favor > of removing that comment, because it's not applicable to arm64. Thanks for figuring this out. Yes, you are right. I missed this point. > >>>> - cfg->linear.num_ents = 1 << smmu->sid_bits; >>>> + cfg->linear.num_ents = num_sids; >>> If you're worried about 32 bit platforms, then I'm wondering if this >>> also needs some attention. cfg->linear.num_ents is defined as an >>> unsigned int and num_sids could potentially be outside the range of an >>> unsigned int on 32 bit platforms. >> The (size > SIZE_MAX) check can guarantee excessively large num_sids >> won't reach here. > Now that I think about it, unsigned int is 32 bit even on arm64. So, > I'm afraid this could (theoretically) overflow. On arm64, I don't > think that the (size > SIZE_MAX) check will prevent this. Yes, SIZE_MAX is ~(size_t)0, but size_t is unsigned long on ARM64. So the check actually doesn't do what I expect it should do. U32_MAX should be used. > >>>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >>>> index 1e9952ca989f..c8ceddc5e8ef 100644 >>>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >>>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h >>>> @@ -853,6 +853,11 @@ struct arm_smmu_master_domain { >>>> ioasid_t ssid; >>>> }; >>>> >>>> +static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu) >>>> +{ >>>> + return (1ULL << smmu->sid_bits); >>>> +} >>>> + >>> I'm wondering if it makes sense to move this up and put it right >>> before arm_smmu_strtab_l1_idx(). That way, all the arm_smmu_strtab_* >>> functions are in one place. >> I did it. But the function uses struct arm_smmu_device which is defined >> after those arm_smmu_strtab_* helpers. I have to put the helper after >> struct arm_smmu_device definition to avoid compile error. We may >> consider re-organize the header file to group them better, but I don't >> think it is urgent enough and it seems out of the scope of the bug fix >> patch. I really want to have the bug fix landed in upstream ASAP. > Understood. Thanks. We could move the changes in > arm_smmu_init_strtab_linear() into a separate patch to accelerate the > process. I'm fine either way, though. I don't want to get in the way > of this landing upstream. Thank you for your understanding. > >>> On a related note, in arm_smmu_init_strtab_2lvl() we're capping the >>> number of l1 entries at STRTAB_MAX_L1_ENTRIES for 2 level stream >>> tables. I'm thinking it would make sense to limit the size of linear >>> stream tables for the same reasons. >> Yes, this also works. But I don't know what value should be used. Jason >> actually suggested (size > SIZE_512M) in v2 review, but I thought the >> value is a magic number. Why 512M? Just because it is too large for >> allocation. So I picked up SIZE_MAX, just because it is the largest size >> supported by size_t type. > I think it should be capped to STRTAB_MAX_L1_ENTRIES I'm not expert on SMMU. Does the linear stream table have the same cap as 2-level stream table? Is this defined by the hardware spec? If it is not, why should we pick this value?
On Fri, Oct 4, 2024 at 6:53 PM Yang Shi <yang@os.amperecomputing.com> wrote: > >>> On a related note, in arm_smmu_init_strtab_2lvl() we're capping the > >>> number of l1 entries at STRTAB_MAX_L1_ENTRIES for 2 level stream > >>> tables. I'm thinking it would make sense to limit the size of linear > >>> stream tables for the same reasons. > >> Yes, this also works. But I don't know what value should be used. Jason > >> actually suggested (size > SIZE_512M) in v2 review, but I thought the > >> value is a magic number. Why 512M? Just because it is too large for > >> allocation. So I picked up SIZE_MAX, just because it is the largest size > >> supported by size_t type. > > I think it should be capped to STRTAB_MAX_L1_ENTRIES > > I'm not expert on SMMU. Does the linear stream table have the same cap > as 2-level stream table? Is this defined by the hardware spec? If it is > not, why should we pick this value? No. I don't think it's defined by the architecture specification. I don't have a strong opinion on the particular value for the size limit of linear Stream tables. However, I do believe that we should pick a size limit. Today, the driver limits the number of Level-1 Stream Table Descriptors in a 2-level Stream table. For consistency, we should limit the size of linear Stream tables, too.
On 10/7/24 9:36 AM, Daniel Mentz wrote: > On Fri, Oct 4, 2024 at 6:53 PM Yang Shi <yang@os.amperecomputing.com> wrote: >>>>> On a related note, in arm_smmu_init_strtab_2lvl() we're capping the >>>>> number of l1 entries at STRTAB_MAX_L1_ENTRIES for 2 level stream >>>>> tables. I'm thinking it would make sense to limit the size of linear >>>>> stream tables for the same reasons. >>>> Yes, this also works. But I don't know what value should be used. Jason >>>> actually suggested (size > SIZE_512M) in v2 review, but I thought the >>>> value is a magic number. Why 512M? Just because it is too large for >>>> allocation. So I picked up SIZE_MAX, just because it is the largest size >>>> supported by size_t type. >>> I think it should be capped to STRTAB_MAX_L1_ENTRIES >> I'm not expert on SMMU. Does the linear stream table have the same cap >> as 2-level stream table? Is this defined by the hardware spec? If it is >> not, why should we pick this value? > No. I don't think it's defined by the architecture specification. I > don't have a strong opinion on the particular value for the size limit > of linear Stream tables. However, I do believe that we should pick a > size limit. Today, the driver limits the number of Level-1 Stream > Table Descriptors in a 2-level Stream table. For consistency, we > should limit the size of linear Stream tables, too. We are on the same page regarding having the size limit. Took a look at the definition of STRTAB_MAX_L1_ENTRIES, I saw this comment: /* * Stream table. * * Linear: Enough to cover 1 << IDR1.SIDSIZE entries * 2lvl: 128k L1 entries, * 256 lazy entries per table (each table covers a PCI bus) */ I'm not sure whether the comment for linear is still true or not with large IDR1.SIDSIZE. But using STRTAB_MAX_L1_ENTRIES for linear does have conflict with the comment. I will pick U32_MAX for now since it is the largest size on 32-bit and good enough to prevent from overflow.
On Fri, Oct 04, 2024 at 06:53:25PM -0700, Yang Shi wrote: > Yes, SIZE_MAX is ~(size_t)0, but size_t is unsigned long on ARM64. So the > check actually doesn't do what I expect it should do. U32_MAX should be > used. SIZE_MAX is right: static inline void *dmam_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp) It is up to dmam_alloc_coherent() to not truncate it's arguments, when you pass the u64 bounded to SIZE_MAX you guarentee that size will be a valid value. > > I think it should be capped to STRTAB_MAX_L1_ENTRIES > > I'm not expert on SMMU. Does the linear stream table have the same cap as > 2-level stream table? Is this defined by the hardware spec? If it is not, > why should we pick this value? Well, the way the driver works is in the 2 level mode it caps the whole table to STRTAB_MAX_L1_ENTRIES * STRTAB_NUM_L2_STES total SIDs which is 0x2000000 or 26 bits I guess there is a reasonable argument that linear or 2 level should have the same software enforced max size. Though would put it at a max 2G linear STE which is still larger than Linux can allocate, so it doesn't really make any practical difference compared to SIZE_MAX. Jason
On 10/7/24 10:50 AM, Jason Gunthorpe wrote: > On Fri, Oct 04, 2024 at 06:53:25PM -0700, Yang Shi wrote: > >> Yes, SIZE_MAX is ~(size_t)0, but size_t is unsigned long on ARM64. So the >> check actually doesn't do what I expect it should do. U32_MAX should be >> used. > SIZE_MAX is right: > > static inline void *dmam_alloc_coherent(struct device *dev, size_t size, > dma_addr_t *dma_handle, gfp_t gfp) > > It is up to dmam_alloc_coherent() to not truncate it's arguments, when > you pass the u64 bounded to SIZE_MAX you guarentee that size will be a > valid value. SIZE_MAX is fine for dmam_alloc_coherent(). The concern from Daniel is the later assignment to cfg->linear.num_ents may truncate the value, which is unsigned int. If I read the code correctly, it looks like dmam_alloc_coherent() may not guarantee to fail large allocation, for example, a very large cma area is configured. It is ridiculous, but it is allowed. Please correct me if I'm wrong. So the concern seems valid to me, so I proposed U32_MAX. > >>> I think it should be capped to STRTAB_MAX_L1_ENTRIES >> I'm not expert on SMMU. Does the linear stream table have the same cap as >> 2-level stream table? Is this defined by the hardware spec? If it is not, >> why should we pick this value? > Well, the way the driver works is in the 2 level mode it caps the > whole table to STRTAB_MAX_L1_ENTRIES * STRTAB_NUM_L2_STES total SIDs > which is 0x2000000 or 26 bits > > I guess there is a reasonable argument that linear or 2 level should > have the same software enforced max size. Though would put it at a max > 2G linear STE which is still larger than Linux can allocate, so it > doesn't really make any practical difference compared to SIZE_MAX. > > Jason
Hi folks, Sorry I'm late to the party, I went fishing. On Fri, Oct 04, 2024 at 11:04:05AM -0700, Yang Shi wrote: > The commit ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()") > calculated the last index of L1 stream table by 1 << smmu->sid_bits. 1 > is 32 bit value. > However some platforms, for example, AmpereOne and the platforms with > ARM MMU-700, have 32-bit stream id size. This resulted in ouf-of-bound shift. > The disassembly of shift is: > > ldr w2, [x19, 828] //, smmu_7(D)->sid_bits > mov w20, 1 > lsl w20, w20, w2 > > According to ARM spec, if the registers are 32 bit, the instruction actually > does: > dest = src << (shift % 32) > > So it actually shifted by zero bit. > > The out-of-bound shift is also undefined behavior according to C > language standard. > > This caused v6.12-rc1 failed to boot on such platforms. > > UBSAN also reported: > > UBSAN: shift-out-of-bounds in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:3628:29 > shift exponent 32 is too large for 32-bit type 'int' > > Using 64 bit immediate when doing shift can solve the problem. The > disassembly after the fix looks like: > ldr w20, [x19, 828] //, smmu_7(D)->sid_bits > mov x0, 1 > lsl x0, x0, x20 > > There are a couple of problematic places, extracted the shift into a helper. > > Fixes: ce410410f1a7 ("iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()") > Tested-by: James Morse <james.morse@arm.com> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Yang Shi <yang@os.amperecomputing.com> > --- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 +++++++++++----- > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 +++++ > 2 files changed, 16 insertions(+), 5 deletions(-) [...] > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > index 737c5b882355..9d4fc91d9258 100644 > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > @@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) > { > u32 l1size; > struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; > + u64 num_sids = arm_smmu_strtab_num_sids(smmu); > unsigned int last_sid_idx = > - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1); > + arm_smmu_strtab_l1_idx(num_sids - 1); > > /* Calculate the L1 size, capped to the SIDSIZE. */ > cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES); > @@ -3655,20 +3656,25 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) > > static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) > { > - u32 size; > + u64 size; > struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; > + u64 num_sids = arm_smmu_strtab_num_sids(smmu); > + > + size = num_sids * sizeof(struct arm_smmu_ste); > + /* The max size for dmam_alloc_coherent() is 32-bit */ > + if (size > SIZE_MAX) > + return -EINVAL; > > - size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste); > cfg->linear.table = dmam_alloc_coherent(smmu->dev, size, > &cfg->linear.ste_dma, > GFP_KERNEL); > if (!cfg->linear.table) { > dev_err(smmu->dev, > - "failed to allocate linear stream table (%u bytes)\n", > + "failed to allocate linear stream table (%llu bytes)\n", > size); > return -ENOMEM; > } > - cfg->linear.num_ents = 1 << smmu->sid_bits; > + cfg->linear.num_ents = num_sids; This all looks a bit messy to me. The architecture guarantees that 2-level stream tables are supported once we hit 7-bit SIDs and, although the driver relaxes this to > 8-bit SIDs, we'll never run into overflow problems in the linear table code above. So I'm inclined to take Daniel's one-liner [1] which just chucks the 'ULL' suffix into the 2-level case. Otherwise, we're in a weird situation where the size is 64-bit for a short while until it gets truncated anyway when we assign it to a 32-bit field. Any objections? Will [1] https://lore.kernel.org/r/20241002015357.1766934-1-danielmentz@google.com
On Tue, Oct 08, 2024 at 02:34:58PM +0100, Will Deacon wrote: > This all looks a bit messy to me. The architecture guarantees that > 2-level stream tables are supported once we hit 7-bit SIDs and, although > the driver relaxes this to > 8-bit SIDs, we'll never run into overflow > problems in the linear table code above. My original point was about the confidential compute position (sigh) that the untrusted hypverisor should not corrupt the driver. So your statement is architecturally true, but we never check that IDR0_ST_LVL_2LVL is set if IDR1_SIDSIZE > 2**7, and so we can get into this situation where the hypervisor could trigger some kind of bad behavior. > So I'm inclined to take Daniel's one-liner [1] which just chucks the > 'ULL' suffix into the 2-level case. Otherwise, we're in a weird I think you should take it and let better be for the CC crowd. Jason
On 10/8/24 8:15 AM, Jason Gunthorpe wrote: > On Tue, Oct 08, 2024 at 02:34:58PM +0100, Will Deacon wrote: > >> This all looks a bit messy to me. The architecture guarantees that >> 2-level stream tables are supported once we hit 7-bit SIDs and, although >> the driver relaxes this to > 8-bit SIDs, we'll never run into overflow >> problems in the linear table code above. > My original point was about the confidential compute position (sigh) > that the untrusted hypverisor should not corrupt the driver. > > So your statement is architecturally true, but we never check that > IDR0_ST_LVL_2LVL is set if IDR1_SIDSIZE > 2**7, and so we can get into > this situation where the hypervisor could trigger some kind of bad > behavior. Jason's concern seems valid to me IMHO. But if the simpler version is preferred, I'd suggest add some comments at least or the check suggested by Jason to make the architecture guarantee more clear. Just in case someone else won't repeat what we had done just because they see "1ULL" in 2lvl code but not in linear code. > >> So I'm inclined to take Daniel's one-liner [1] which just chucks the >> 'ULL' suffix into the 2-level case. Otherwise, we're in a weird > I think you should take it and let better be for the CC crowd. > > Jason
On Tue, Oct 08, 2024 at 10:04:50AM -0700, Yang Shi wrote: > On 10/8/24 8:15 AM, Jason Gunthorpe wrote: > > On Tue, Oct 08, 2024 at 02:34:58PM +0100, Will Deacon wrote: > > > > > This all looks a bit messy to me. The architecture guarantees that > > > 2-level stream tables are supported once we hit 7-bit SIDs and, although > > > the driver relaxes this to > 8-bit SIDs, we'll never run into overflow > > > problems in the linear table code above. > > My original point was about the confidential compute position (sigh) > > that the untrusted hypverisor should not corrupt the driver. > > > > So your statement is architecturally true, but we never check that > > IDR0_ST_LVL_2LVL is set if IDR1_SIDSIZE > 2**7, and so we can get into > > this situation where the hypervisor could trigger some kind of bad > > behavior. > > Jason's concern seems valid to me IMHO. But if the simpler version is > preferred, I'd suggest add some comments at least or the check suggested by > Jason to make the architecture guarantee more clear. Just in case someone > else won't repeat what we had done just because they see "1ULL" in 2lvl code > but not in linear code. In principle, I'm not opposed to hardening the driver against malicious virtual hardware, but I don't think we need to add comments everywhere where we're relying on architectural behaviour. The hardening should also be done separately from fixing a bug affecting users with real hardware. Will
On Tue, Oct 08, 2024 at 12:15:06PM -0300, Jason Gunthorpe wrote: > On Tue, Oct 08, 2024 at 02:34:58PM +0100, Will Deacon wrote: > > > This all looks a bit messy to me. The architecture guarantees that > > 2-level stream tables are supported once we hit 7-bit SIDs and, although > > the driver relaxes this to > 8-bit SIDs, we'll never run into overflow > > problems in the linear table code above. > > My original point was about the confidential compute position (sigh) > that the untrusted hypverisor should not corrupt the driver. > > So your statement is architecturally true, but we never check that > IDR0_ST_LVL_2LVL is set if IDR1_SIDSIZE > 2**7, and so we can get into > this situation where the hypervisor could trigger some kind of bad > behavior. > > > So I'm inclined to take Daniel's one-liner [1] which just chucks the > > 'ULL' suffix into the 2-level case. Otherwise, we're in a weird > > I think you should take it and let better be for the CC crowd. Heh. I wish them luck! :p Will
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 737c5b882355..9d4fc91d9258 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -3624,8 +3624,9 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) { u32 l1size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; + u64 num_sids = arm_smmu_strtab_num_sids(smmu); unsigned int last_sid_idx = - arm_smmu_strtab_l1_idx((1 << smmu->sid_bits) - 1); + arm_smmu_strtab_l1_idx(num_sids - 1); /* Calculate the L1 size, capped to the SIDSIZE. */ cfg->l2.num_l1_ents = min(last_sid_idx + 1, STRTAB_MAX_L1_ENTRIES); @@ -3655,20 +3656,25 @@ static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) { - u32 size; + u64 size; struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; + u64 num_sids = arm_smmu_strtab_num_sids(smmu); + + size = num_sids * sizeof(struct arm_smmu_ste); + /* The max size for dmam_alloc_coherent() is 32-bit */ + if (size > SIZE_MAX) + return -EINVAL; - size = (1 << smmu->sid_bits) * sizeof(struct arm_smmu_ste); cfg->linear.table = dmam_alloc_coherent(smmu->dev, size, &cfg->linear.ste_dma, GFP_KERNEL); if (!cfg->linear.table) { dev_err(smmu->dev, - "failed to allocate linear stream table (%u bytes)\n", + "failed to allocate linear stream table (%llu bytes)\n", size); return -ENOMEM; } - cfg->linear.num_ents = 1 << smmu->sid_bits; + cfg->linear.num_ents = num_sids; arm_smmu_init_initial_stes(cfg->linear.table, cfg->linear.num_ents); return 0; diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h index 1e9952ca989f..c8ceddc5e8ef 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h @@ -853,6 +853,11 @@ struct arm_smmu_master_domain { ioasid_t ssid; }; +static inline u64 arm_smmu_strtab_num_sids(struct arm_smmu_device *smmu) +{ + return (1ULL << smmu->sid_bits); +} + static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) { return container_of(dom, struct arm_smmu_domain, domain);