Message ID | 1447034266-28003-2-git-send-email-okaya@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 11/09/2015 02:57 AM, Sinan Kaya wrote: > Current code gives up when 32 bit DMA is not supported. > This problem has been observed on systems without any > memory below 4 gig. > > This patch tests 64 bit support before bailing out to find > a working combination. > That feels decidedly odd. Why do you probe for 64bit if 32bit fails? Typically it's the other way round, on the grounds that 64bit DMA should be preferred over 32bit. Can you explain why it needs to be done the other way round here? Cheers, Hannes
On Monday 09 November 2015 08:09:39 Hannes Reinecke wrote: > On 11/09/2015 02:57 AM, Sinan Kaya wrote: > > Current code gives up when 32 bit DMA is not supported. > > This problem has been observed on systems without any > > memory below 4 gig. > > > > This patch tests 64 bit support before bailing out to find > > a working combination. > > > That feels decidedly odd. > > Why do you probe for 64bit if 32bit fails? 32-bit DMA mask on PCI cannot fail, we rely on that in all sorts of places. If the machine has all of its RAM visible above 4GB PCI bus addresses, it must use an IOMMU. > Typically it's the other way round, on the grounds that 64bit DMA > should be preferred over 32bit. > Can you explain why it needs to be done the other way round here? Something else is odd here, the driver already checks for dma_get_required_mask(), which will return the smallest mask that fits all of RAM. If the machine has any memory above 4GB, it already uses the 64-bit mask, and only falls back to the 32-bit mask if that fails or if all memory fits within the first 4GB. So both the description and the patch are wrong. :( Arnd
On 11/9/2015 2:09 AM, Hannes Reinecke wrote: > On 11/09/2015 02:57 AM, Sinan Kaya wrote: >> Current code gives up when 32 bit DMA is not supported. >> This problem has been observed on systems without any >> memory below 4 gig. >> >> This patch tests 64 bit support before bailing out to find >> a working combination. >> > That feels decidedly odd. > > Why do you probe for 64bit if 32bit fails? > Typically it's the other way round, on the grounds that 64bit DMA > should be preferred over 32bit. > Can you explain why it needs to be done the other way round here? > > Cheers, > > Hannes > The platform does not have any memory below 4G. So, 32 bit DMA is not possible. I'm trying to use 64 bit DMA instead since both the platform and hardware supports it. Current code will not try 64 bit DMA if 32 bit DMA is not working.
On 11/9/2015 3:59 AM, Arnd Bergmann wrote: > On Monday 09 November 2015 08:09:39 Hannes Reinecke wrote: >> On 11/09/2015 02:57 AM, Sinan Kaya wrote: >>> Current code gives up when 32 bit DMA is not supported. >>> This problem has been observed on systems without any >>> memory below 4 gig. >>> >>> This patch tests 64 bit support before bailing out to find >>> a working combination. >>> >> That feels decidedly odd. >> >> Why do you probe for 64bit if 32bit fails? > > 32-bit DMA mask on PCI cannot fail, we rely on that in all sorts > of places. If the machine has all of its RAM visible above 4GB > PCI bus addresses, it must use an IOMMU. Can you be specific? PCIe does not have this limitation. It supports 32 bit and 64 bit TLPs. I have not seen any limitation so far in the OS either. Using IOMMU is fine but not required if the endpoint is a true 64 bit supporting endpoint. This endpoint supports 64bit too. > >> Typically it's the other way round, on the grounds that 64bit DMA >> should be preferred over 32bit. >> Can you explain why it needs to be done the other way round here? > > Something else is odd here, the driver already checks for > dma_get_required_mask(), which will return the smallest mask > that fits all of RAM. If the machine has any memory above 4GB, > it already uses the 64-bit mask, and only falls back to > the 32-bit mask if that fails or if all memory fits within the > first 4GB. > I'll add some prints in the code to get to the bottom of it. I think the code is checking more than just the required mask and failing in one of the other conditions. At least that DMA comparison code was more than what I have ever seen. > So both the description and the patch are wrong. :( > > Arnd >
On Monday 09 November 2015 09:07:36 Sinan Kaya wrote: > > On 11/9/2015 3:59 AM, Arnd Bergmann wrote: > > On Monday 09 November 2015 08:09:39 Hannes Reinecke wrote: > >> On 11/09/2015 02:57 AM, Sinan Kaya wrote: > >>> Current code gives up when 32 bit DMA is not supported. > >>> This problem has been observed on systems without any > >>> memory below 4 gig. > >>> > >>> This patch tests 64 bit support before bailing out to find > >>> a working combination. > >>> > >> That feels decidedly odd. > >> > >> Why do you probe for 64bit if 32bit fails? > > > > 32-bit DMA mask on PCI cannot fail, we rely on that in all sorts > > of places. If the machine has all of its RAM visible above 4GB > > PCI bus addresses, it must use an IOMMU. > > Can you be specific? PCIe does not have this limitation. It supports 32 > bit and 64 bit TLPs. > > I have not seen any limitation so far in the OS either. See Documentation/DMA-API-HOWTO.txt All PCI devices start out with a 32-bit DMA mask, and Linux assumes it can always fall back to a 32-bit mask if a smaller mask (needed for some devices that only support DMA on a subset of the 4GB) or larger mask (needed to address high memory but can fail when the PCI host does not support it) cannot be set. > Using IOMMU is fine but not required if the endpoint is a true 64 bit > supporting endpoint. This endpoint supports 64bit too. There are two aspects here: a) setting a 32-bit mask should not have failed. Any device that actually needs 32-bit DMA will make the same call and the platform must guarantee that this works. If you have a broken platform that can't do this, complain to your board vendor so they wire up the RAM properly, with at least the first 2GB visible to low PCI bus addresses. b) If the platform has any memory above 4GB and the supports high DMA, it should never have asked for the 32-bit mask before trying the 64-bit mask first. This is only a performance optimization, but the driver seems to do the right thing, so I assume there is something wrong with the platform code. > >> Typically it's the other way round, on the grounds that 64bit DMA > >> should be preferred over 32bit. > >> Can you explain why it needs to be done the other way round here? > > > > Something else is odd here, the driver already checks for > > dma_get_required_mask(), which will return the smallest mask > > that fits all of RAM. If the machine has any memory above 4GB, > > it already uses the 64-bit mask, and only falls back to > > the 32-bit mask if that fails or if all memory fits within the > > first 4GB. > > > > I'll add some prints in the code to get to the bottom of it. I think the > code is checking more than just the required mask and failing in one of > the other conditions. At least that DMA comparison code was more than > what I have ever seen. Ok. That should take care of b) above, but we still have a bug with a) that will come back to bite you if you don't address it right. Arnd
On 11/9/2015 9:33 AM, Arnd Bergmann wrote: > On Monday 09 November 2015 09:07:36 Sinan Kaya wrote: >> >> On 11/9/2015 3:59 AM, Arnd Bergmann wrote: >>> On Monday 09 November 2015 08:09:39 Hannes Reinecke wrote: >>>> On 11/09/2015 02:57 AM, Sinan Kaya wrote: >>>>> Current code gives up when 32 bit DMA is not supported. >>>>> This problem has been observed on systems without any >>>>> memory below 4 gig. >>>>> >>>>> This patch tests 64 bit support before bailing out to find >>>>> a working combination. >>>>> >>>> That feels decidedly odd. >>>> >>>> Why do you probe for 64bit if 32bit fails? >>> >>> 32-bit DMA mask on PCI cannot fail, we rely on that in all sorts >>> of places. If the machine has all of its RAM visible above 4GB >>> PCI bus addresses, it must use an IOMMU. >> >> Can you be specific? PCIe does not have this limitation. It supports 32 >> bit and 64 bit TLPs. >> >> I have not seen any limitation so far in the OS either. > > See Documentation/DMA-API-HOWTO.txt > > All PCI devices start out with a 32-bit DMA mask, and Linux assumes it > can always fall back to a 32-bit mask if a smaller mask (needed for > some devices that only support DMA on a subset of the 4GB) or larger > mask (needed to address high memory but can fail when the PCI host > does not support it) cannot be set. > >> Using IOMMU is fine but not required if the endpoint is a true 64 bit >> supporting endpoint. This endpoint supports 64bit too. > > There are two aspects here: > > a) setting a 32-bit mask should not have failed. Any device that actually > needs 32-bit DMA will make the same call and the platform must > guarantee that this works. If you have a broken platform that can't > do this, complain to your board vendor so they wire up the RAM > properly, with at least the first 2GB visible to low PCI bus > addresses. > > b) If the platform has any memory above 4GB and the supports high DMA, > it should never have asked for the 32-bit mask before trying the > 64-bit mask first. This is only a performance optimization, but the > driver seems to do the right thing, so I assume there is something > wrong with the platform code. > >>>> Typically it's the other way round, on the grounds that 64bit DMA >>>> should be preferred over 32bit. >>>> Can you explain why it needs to be done the other way round here? >>> >>> Something else is odd here, the driver already checks for >>> dma_get_required_mask(), which will return the smallest mask >>> that fits all of RAM. If the machine has any memory above 4GB, >>> it already uses the 64-bit mask, and only falls back to >>> the 32-bit mask if that fails or if all memory fits within the >>> first 4GB. >>> >> >> I'll add some prints in the code to get to the bottom of it. I think the >> code is checking more than just the required mask and failing in one of >> the other conditions. At least that DMA comparison code was more than >> what I have ever seen. > > Ok. That should take care of b) above, but we still have a bug with a) > that will come back to bite you if you don't address it right. > > Arnd > B should have worked and it doesn't. B works for other drivers but not for this particular one. As you said, "it should never have asked for the 32-bit mask before trying the 64-bit mask first." this is the problem. This HW doesn't have support for a. If I need a, I am required to use IOMMU. Here is what I found. mpt2sas version 20.100.00.00 loaded sizeof(dma_addr_t):8 ioc->dma_mask :0 dma_get_required_mask(&pdev->dev) :7fffffffff mpt2sas0: no suitable DMA mask for 0002:01:00.0 mpt2sas0: failure at drivers/scsi/mpt2sas/mpt2sas_scsih.c:8498/_scsih_probe()! Here is the code. if (ioc->dma_mask) consistent_dma_mask = DMA_BIT_MASK(64); else consistent_dma_mask = DMA_BIT_MASK(32); <-- why here? if (sizeof(dma_addr_t) > 4) { <-- true const uint64_t required_mask = dma_get_required_mask(&pdev->dev); if ((required_mask > DMA_BIT_MASK(32)) && <-- true !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && <-- true !pci_set_consistent_dma_mask(pdev, consistent_dma_mask)) { <-- false ioc->base_add_sg_single = &_base_add_sg_single_64; ioc->sge_size = sizeof(Mpi2SGESimple64_t); ioc->dma_mask = 64; goto out; } } ioc->dma_mask is 0 and the driver is trying to use 32 bit even though 64 bit supported by the platform. I think the proper fix is to pass the required_mask back to consistent_dma_mask rather than using ioc->dma_mask and guessing. pci_set_consistent_dma_mask(pdev, required_mask) My code was just a band aid for broken code. The driver worked after that changing the above line only. mpt2sas_version_20.100.00.00_loaded sizeof(dma_addr_t) :8 ioc->dma_mask :0 dma_get_required_mask(&pdev->dev) :7fffffffff mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (16411112 kB) mpt2sas0: MSI-X vectors supported: 1, no of cores: 24, max_msix_vectors: 8 mpt2sas0: IO-APIC enabled: IRQ 105 mpt2sas0: iomem(0x00000a01005c0000), mapped(0xffff0000008d8000), size(16384) mpt2sas0: ioport(0x0000000000000000), size(0) mpt2sas0: Allocated physical memory: size(3392 kB) mpt2sas0: Current Controller Queue Depth(1483), Max Controller Queue Depth(1720) mpt2sas0: Scatter Gather Elements per IO(128) mpt2sas0: LSISAS2004: FWVersion(17.00.01.00), ChipRevision(0x03), BiosVersion(07.33.0 mpt2sas0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trac scsi host0: Fusion MPT SAS Host mpt2sas0: sending port enable !! mpt2sas0: host_add: handle(0x0001), sas_addr(0x500062b0002de4f0), phys(8) mpt2sas0: port enable: SUCCESS scsi 0:0:0:0: Direct-Access ATA Samsung SSD 845D 3X3Q PQ: 0 ANSI: 6 scsi 0:0:0:0: SATA: handle(0x0009), sas_addr(0x4433221101000000), phy(1), device_name
On 11/09/2015 05:22 PM, Sinan Kaya wrote: > > if (ioc->dma_mask) > consistent_dma_mask = DMA_BIT_MASK(64); > else > consistent_dma_mask = DMA_BIT_MASK(32); <-- why here? So this change is from this patch: http://permalink.gmane.org/gmane.linux.kernel/1759343 Note that this was discussed back in February: https://lkml.org/lkml/2015/2/20/2
On Monday 09 November 2015 18:22:22 Sinan Kaya wrote: > On 11/9/2015 9:33 AM, Arnd Bergmann wrote: > > On Monday 09 November 2015 09:07:36 Sinan Kaya wrote: > >> On 11/9/2015 3:59 AM, Arnd Bergmann wrote: > > ioc->dma_mask is 0 and the driver is trying to use 32 bit even though 64 > bit supported by the platform. Ok, makes sense. > I think the proper fix is to pass the required_mask back to > consistent_dma_mask rather than using ioc->dma_mask and guessing. > > pci_set_consistent_dma_mask(pdev, required_mask) > > My code was just a band aid for broken code. > No, as Timur found, the driver is correct and it intentionally sets the 32-bit mask, and that is guaranteed to work on all sane hardware. Don't change the driver but find a better platform for your workload, or talk to the people that are responsible for the platform and get them to fix it. If the platform also doesn't have an IOMMU, you can probably work around it by setting up the dma-ranges property of the PCI host to map the low PCI addresses to the start of RAM. This will also require changes in the bootloader to set up the PCI outbound translation, and it will require implementing the DMA offset on ARM64, which I was hoping to avoid. Arnd
On 11/10/2015 3:38 AM, Arnd Bergmann wrote: > No, as Timur found, the driver is correct and it intentionally > sets the 32-bit mask, and that is guaranteed to work on all sane > hardware. Don't change the driver but find a better platform for > your workload, or talk to the people that are responsible for > the platform and get them to fix it. Platform does have an IOMMU. No issues there. I am trying to clean out the patch pipe I have in order to get this card working with and without IOMMU. > > If the platform also doesn't have an IOMMU, you can probably work > around it by setting up the dma-ranges property of the PCI host > to map the low PCI addresses to the start of RAM. This will also > require changes in the bootloader to set up the PCI outbound translation, > and it will require implementing the DMA offset on ARM64, which I was > hoping to avoid. From the email thread, it looks like this was introduced to support some legacy card that has 64 bit addressing limitations and is being carried around ("rotted") since then. I'm the second guy after the powerpc architecture complaining about the very same issue. Any red flags? I can't change the address map for PCIe. SBSA requires all inbound PCIe addresses to be non-translated. I'll just have to stick with IOMMU for this card.
On Tuesday 10 November 2015 11:06:40 Sinan Kaya wrote: > On 11/10/2015 3:38 AM, Arnd Bergmann wrote: > > No, as Timur found, the driver is correct and it intentionally > > sets the 32-bit mask, and that is guaranteed to work on all sane > > hardware. Don't change the driver but find a better platform for > > your workload, or talk to the people that are responsible for > > the platform and get them to fix it. > > Platform does have an IOMMU. No issues there. I am trying to clean out > the patch pipe I have in order to get this card working with and without > IOMMU. On PowerPC, I think we automatically enable the IOMMU whenever a DMA mask is set that doesn't cover all of the RAM. We could think about doing the same thing on ARM64 to make all devices work out of the box. > > If the platform also doesn't have an IOMMU, you can probably work > > around it by setting up the dma-ranges property of the PCI host > > to map the low PCI addresses to the start of RAM. This will also > > require changes in the bootloader to set up the PCI outbound translation, > > and it will require implementing the DMA offset on ARM64, which I was > > hoping to avoid. > > From the email thread, it looks like this was introduced to support > some legacy card that has 64 bit addressing limitations and is being > carried around ("rotted") since then. > > I'm the second guy after the powerpc architecture complaining about the > very same issue. Any red flags? What BenH was worried about here is that the driver sets different masks for streaming and coherent mappings, which is indeed a worry that could hit us on ARM as well, but I suppose we'll have to deal with that in platform code. Setting both masks to 32-bit is something that a lot of drivers do, and without IOMMU enabled, you'd hit the same bug on all of them. > I can't change the address map for PCIe. SBSA requires all inbound PCIe > addresses to be non-translated. What about changing the memory map? I suspect there will be more problems for you in the future when all of your RAM is at high addresses. Is this something you could fix in the bootloader by moving the first 2GB to a different CPU physical address? > I'll just have to stick with IOMMU for this card. Ok. But how do you currently decide whether to use the IOMMU or not? Arnd
On 11/10/2015 10:47 AM, Arnd Bergmann wrote: > What BenH was worried about here is that the driver sets different masks > for streaming and coherent mappings, which is indeed a worry that > could hit us on ARM as well, but I suppose we'll have to deal with > that in platform code. > > Setting both masks to 32-bit is something that a lot of drivers do, > and without IOMMU enabled, you'd hit the same bug on all of them. Also note that I think that on PowerPC, the mask is set to 32 by default for all devices. I don't think we do that on ARM64. So on PowerPC, some drivers get away with not explicitly setting the mask.
On 11/10/2015 11:47 AM, Arnd Bergmann wrote: > On Tuesday 10 November 2015 11:06:40 Sinan Kaya wrote: >> On 11/10/2015 3:38 AM, Arnd Bergmann wrote: >> > No, as Timur found, the driver is correct and it intentionally >>> sets the 32-bit mask, and that is guaranteed to work on all sane >>> hardware. Don't change the driver but find a better platform for >>> your workload, or talk to the people that are responsible for >>> the platform and get them to fix it. >> >> Platform does have an IOMMU. No issues there. I am trying to clean out >> the patch pipe I have in order to get this card working with and without >> IOMMU. > > On PowerPC, I think we automatically enable the IOMMU whenever a DMA > mask is set that doesn't cover all of the RAM. We could think about > doing the same thing on ARM64 to make all devices work out of the box. > The ACPI IORT table declares whether you enable IOMMU for a particular device or not. The placement of IOMMU HW is system specific. The IORT table gives the IOMMU HW topology to the operating system. >>> If the platform also doesn't have an IOMMU, you can probably work >>> around it by setting up the dma-ranges property of the PCI host >>> to map the low PCI addresses to the start of RAM. This will also >>> require changes in the bootloader to set up the PCI outbound translation, >>> and it will require implementing the DMA offset on ARM64, which I was >>> hoping to avoid. >> >> From the email thread, it looks like this was introduced to support >> some legacy card that has 64 bit addressing limitations and is being >> carried around ("rotted") since then. >> >> I'm the second guy after the powerpc architecture complaining about the >> very same issue. Any red flags? > > What BenH was worried about here is that the driver sets different masks > for streaming and coherent mappings, which is indeed a worry that > could hit us on ARM as well, but I suppose we'll have to deal with > that in platform code. > > Setting both masks to 32-bit is something that a lot of drivers do, > and without IOMMU enabled, you'd hit the same bug on all of them. > Maybe, maybe not. This is the only card that I had problems with. >> I can't change the address map for PCIe. SBSA requires all inbound PCIe >> addresses to be non-translated. > > What about changing the memory map? I suspect there will be more > problems for you in the future when all of your RAM is at high > addresses. Is this something you could fix in the bootloader by > moving the first 2GB to a different CPU physical address? I'm thinking about this. > >> I'll just have to stick with IOMMU for this card. > > Ok. But how do you currently decide whether to use the IOMMU or not? > ACPI table. I wanted to get this fix in so that all operating systems whether they have IOMMU driver enabled or not would work. > Arnd >
On Tue, 2015-11-10 at 12:19 -0500, Sinan Kaya wrote: > On 11/10/2015 11:47 AM, Arnd Bergmann wrote: > > On Tuesday 10 November 2015 11:06:40 Sinan Kaya wrote: > >> On 11/10/2015 3:38 AM, Arnd Bergmann wrote: > >> From the email thread, it looks like this was introduced to support > >> some legacy card that has 64 bit addressing limitations and is being > >> carried around ("rotted") since then. > >> > >> I'm the second guy after the powerpc architecture complaining about the > >> very same issue. Any red flags? > > > > What BenH was worried about here is that the driver sets different masks > > for streaming and coherent mappings, which is indeed a worry that > > could hit us on ARM as well, but I suppose we'll have to deal with > > that in platform code. > > > > Setting both masks to 32-bit is something that a lot of drivers do, > > and without IOMMU enabled, you'd hit the same bug on all of them. > > > > Maybe, maybe not. This is the only card that I had problems with. Your characterisation of "some legacy card" isn't entirely correct. Just to clarify how this happens, most I/O cards today are intelligent offload engines which means they have some type of embedded CPU (it can even be a specially designed asic). This CPU is driven by firmware which is mostly (but not always) in the machine language of the CPU. DMA transfers are sometimes run by this CPU, but mostly handed off to a separate offload engine. When the board gets revised, it's often easier to update the offload engine to 64 bits and keep the CPU at 32 (or even 16) bits. This means that all the internal addresses in the firmware are 32 bit only. As I read the comments in the original thread, it looks like the mpt people tried to mitigate this by using segment registers for external addresses firmware uses ... that's why they say that they don't have to have all the addresses in DMA32 ... they just need the upper 32 bits to be constant so they can correctly program the segment register. Unfortunately, we have no way to parametrise this to the DMA allocation code. You'll find the same thing with Adaptec SPI cards. Their route to 64 bits was via an initial 39 bit extension that had them layering the additional 7 bits into the unused lower region of the page descriptors for the firmware (keeping the actual pointers to DMA at 32 bits because they're always parametrised as address, offset, length and the address is always a 4k page). Eventually, everything will rev to 64 bits and this problem will go away, but, as I suspect you know, it takes time for the embedded world to get to where everyone else already is. As Arnd said, if you failed to allow for this in your platform, then oops, just don't use the card. I think this solution would be better than trying to get the driver to work out which cards can support 64 bit firmware descriptors and only failing on your platform for those that can't. James
On Tuesday 10 November 2015 11:00:59 Timur Tabi wrote: > On 11/10/2015 10:47 AM, Arnd Bergmann wrote: > > What BenH was worried about here is that the driver sets different masks > > for streaming and coherent mappings, which is indeed a worry that > > could hit us on ARM as well, but I suppose we'll have to deal with > > that in platform code. > > > > Setting both masks to 32-bit is something that a lot of drivers do, > > and without IOMMU enabled, you'd hit the same bug on all of them. > > Also note that I think that on PowerPC, the mask is set to 32 by default > for all devices. I don't think we do that on ARM64. So on PowerPC, > some drivers get away with not explicitly setting the mask. > If the mask is 64-bit by default on ARM64, that is a bug that we need to fix urgently. Can you verify this? A lot of PCI devices can only do 32-bit DMA, and we have plenty of drivers that don't bother setting a mask at all because the 32-bit mask is the default on all other architectures. Arnd
On 11/10/2015 1:27 PM, James Bottomley wrote: > On Tue, 2015-11-10 at 12:19 -0500, Sinan Kaya wrote: >> On 11/10/2015 11:47 AM, Arnd Bergmann wrote: >>> On Tuesday 10 November 2015 11:06:40 Sinan Kaya wrote: >>>> On 11/10/2015 3:38 AM, Arnd Bergmann wrote: >>>> From the email thread, it looks like this was introduced to support >>>> some legacy card that has 64 bit addressing limitations and is being >>>> carried around ("rotted") since then. >>>> >>>> I'm the second guy after the powerpc architecture complaining about the >>>> very same issue. Any red flags? >>> >>> What BenH was worried about here is that the driver sets different masks >>> for streaming and coherent mappings, which is indeed a worry that >>> could hit us on ARM as well, but I suppose we'll have to deal with >>> that in platform code. >>> >>> Setting both masks to 32-bit is something that a lot of drivers do, >>> and without IOMMU enabled, you'd hit the same bug on all of them. >>> >> >> Maybe, maybe not. This is the only card that I had problems with. > > Your characterisation of "some legacy card" isn't entirely correct. > Just to clarify how this happens, most I/O cards today are intelligent > offload engines which means they have some type of embedded CPU (it can > even be a specially designed asic). This CPU is driven by firmware > which is mostly (but not always) in the machine language of the CPU. > DMA transfers are sometimes run by this CPU, but mostly handed off to a > separate offload engine. When the board gets revised, it's often easier > to update the offload engine to 64 bits and keep the CPU at 32 (or even > 16) bits. This means that all the internal addresses in the firmware > are 32 bit only. As I read the comments in the original thread, it > looks like the mpt people tried to mitigate this by using segment > registers for external addresses firmware uses ... that's why they say > that they don't have to have all the addresses in DMA32 ... they just > need the upper 32 bits to be constant so they can correctly program the > segment register. Unfortunately, we have no way to parametrise this to > the DMA allocation code. > > You'll find the same thing with Adaptec SPI cards. Their route to 64 > bits was via an initial 39 bit extension that had them layering the > additional 7 bits into the unused lower region of the page descriptors > for the firmware (keeping the actual pointers to DMA at 32 bits because > they're always parametrised as address, offset, length and the address > is always a 4k page). > > Eventually, everything will rev to 64 bits and this problem will go > away, but, as I suspect you know, it takes time for the embedded world > to get to where everyone else already is. > > As Arnd said, if you failed to allow for this in your platform, then > oops, just don't use the card. I think this solution would be better > than trying to get the driver to work out which cards can support 64 bit > firmware descriptors and only failing on your platform for those that > can't. > > James > > James, I was referring to this conversation here. https://lkml.org/lkml/2015/2/20/31 "The aic79xx hardware problem was that the DMA engine could address the whole of memory (it had two address modes, a 39 bit one and a 64 bit one) but the script engine that runs the mailboxes only had a 32 bit activation register (the activating write points at the physical address of the script to begin executing)." The fact that LSI SAS 92118i is working with 64 bit addresses suggests me that this problem is already solved. I have not hit any kind of regressions with 93xx and 92xx families under load in a true 64 bit environment. I am only mentioning this based on my testing exposure. Another comment here from you. https://lkml.org/lkml/2015/4/2/28 "Well, it was originally a hack for altix, because they had no regions below 4GB and had to specifically manufacture them. As you know, in Linux, if Intel doesn't need it, no-one cares and the implementation bitrots." Maybe, it is time to fix the code for more recent (even decent) hardware? Sinan
On Tue, 2015-11-10 at 14:14 -0500, Sinan Kaya wrote: > > On 11/10/2015 1:27 PM, James Bottomley wrote: > > On Tue, 2015-11-10 at 12:19 -0500, Sinan Kaya wrote: > >> On 11/10/2015 11:47 AM, Arnd Bergmann wrote: > >>> On Tuesday 10 November 2015 11:06:40 Sinan Kaya wrote: > >>>> On 11/10/2015 3:38 AM, Arnd Bergmann wrote: > >>>> From the email thread, it looks like this was introduced to support > >>>> some legacy card that has 64 bit addressing limitations and is being > >>>> carried around ("rotted") since then. > >>>> > >>>> I'm the second guy after the powerpc architecture complaining about the > >>>> very same issue. Any red flags? > >>> > >>> What BenH was worried about here is that the driver sets different masks > >>> for streaming and coherent mappings, which is indeed a worry that > >>> could hit us on ARM as well, but I suppose we'll have to deal with > >>> that in platform code. > >>> > >>> Setting both masks to 32-bit is something that a lot of drivers do, > >>> and without IOMMU enabled, you'd hit the same bug on all of them. > >>> > >> > >> Maybe, maybe not. This is the only card that I had problems with. > > > > Your characterisation of "some legacy card" isn't entirely correct. > > Just to clarify how this happens, most I/O cards today are intelligent > > offload engines which means they have some type of embedded CPU (it can > > even be a specially designed asic). This CPU is driven by firmware > > which is mostly (but not always) in the machine language of the CPU. > > DMA transfers are sometimes run by this CPU, but mostly handed off to a > > separate offload engine. When the board gets revised, it's often easier > > to update the offload engine to 64 bits and keep the CPU at 32 (or even > > 16) bits. This means that all the internal addresses in the firmware > > are 32 bit only. As I read the comments in the original thread, it > > looks like the mpt people tried to mitigate this by using segment > > registers for external addresses firmware uses ... that's why they say > > that they don't have to have all the addresses in DMA32 ... they just > > need the upper 32 bits to be constant so they can correctly program the > > segment register. Unfortunately, we have no way to parametrise this to > > the DMA allocation code. > > > > You'll find the same thing with Adaptec SPI cards. Their route to 64 > > bits was via an initial 39 bit extension that had them layering the > > additional 7 bits into the unused lower region of the page descriptors > > for the firmware (keeping the actual pointers to DMA at 32 bits because > > they're always parametrised as address, offset, length and the address > > is always a 4k page). > > > > Eventually, everything will rev to 64 bits and this problem will go > > away, but, as I suspect you know, it takes time for the embedded world > > to get to where everyone else already is. > > > > As Arnd said, if you failed to allow for this in your platform, then > > oops, just don't use the card. I think this solution would be better > > than trying to get the driver to work out which cards can support 64 bit > > firmware descriptors and only failing on your platform for those that > > can't. > > > > James > > > > > > James, > I was referring to this conversation here. > > https://lkml.org/lkml/2015/2/20/31 > > "The aic79xx hardware problem was that the DMA engine could address the > whole of memory (it had two address modes, a 39 bit one and a 64 bit > one) but the script engine that runs the mailboxes only had a 32 bit > activation register (the activating write points at the physical address > of the script to begin executing)." > > The fact that LSI SAS 92118i is working with 64 bit addresses suggests > me that this problem is already solved. I have not hit any kind of > regressions with 93xx and 92xx families under load in a true 64 bit > environment. I am only mentioning this based on my testing exposure. The Issue, as stated by LSI is Initially set the consistent DMA mask to 32 bit and then change it to 64 bit mask after allocating RDPQ pools by calling the function _base_change_consistent_dma_mask. This is to ensure that all the upper 32 bits of RDPQ entries's base address to be same. If you set a 64 bit coherent mask before this point, you're benefiting from being lucky that all the upper 32 bits of the allocations are the same ... we can't code a driver to rely on luck. Particularly not when the failure mode looks like it would be silent and deadly. > Another comment here from you. > https://lkml.org/lkml/2015/4/2/28 > > "Well, it was originally a hack for altix, because they had no regions > below 4GB and had to specifically manufacture them. As you know, in > Linux, if Intel doesn't need it, no-one cares and the implementation > bitrots." > > Maybe, it is time to fix the code for more recent (even decent) hardware? What do you mean "fix the code"? The code isn't broken, it's parametrising issues with particular hardware. There's no software work around (except allocating memory with the correct characteristics). James
On 11/10/2015 2:43 PM, James Bottomley wrote: > The Issue, as stated by LSI is > > Initially set the consistent DMA mask to 32 bit and then change > it > to 64 bit mask after allocating RDPQ pools by calling the > function > _base_change_consistent_dma_mask. This is to ensure that all the > upper 32 bits of RDPQ entries's base address to be same. > Need somebody from mpt to confirm that this behavior is still valid for the recent cards besides altix. > If you set a 64 bit coherent mask before this point, you're benefiting > from being lucky that all the upper 32 bits of the allocations are the > same ... we can't code a driver to rely on luck. Particularly not when > the failure mode looks like it would be silent and deadly. Of course nobody wants unreliable code. I'm wondering if I was just lucky during my testing or the 92xx and 93xx hardware supports full 64 bit range. I don't have any insight into what the endpoint does or what it is capable of. > >> >Another comment here from you. >> >https://lkml.org/lkml/2015/4/2/28 >> > >> >"Well, it was originally a hack for altix, because they had no regions >> >below 4GB and had to specifically manufacture them. As you know, in >> >Linux, if Intel doesn't need it, no-one cares and the implementation >> >bitrots." >> > >> >Maybe, it is time to fix the code for more recent (even decent) hardware? > What do you mean "fix the code"? The code isn't broken, it's > parametrising issues with particular hardware. There's no software work > around (except allocating memory with the correct characteristics). Need confirmation. I'm questioning if we are stuck with this behavior because of altix or something else. If the latter case, the code could have used PCI ID to do something special for it.
On Tuesday 10 November 2015 12:19:33 Sinan Kaya wrote: > On 11/10/2015 11:47 AM, Arnd Bergmann wrote: > > On Tuesday 10 November 2015 11:06:40 Sinan Kaya wrote: > >> On 11/10/2015 3:38 AM, Arnd Bergmann wrote: > >> > No, as Timur found, the driver is correct and it intentionally > >>> sets the 32-bit mask, and that is guaranteed to work on all sane > >>> hardware. Don't change the driver but find a better platform for > >>> your workload, or talk to the people that are responsible for > >>> the platform and get them to fix it. > >> > >> Platform does have an IOMMU. No issues there. I am trying to clean out > >> the patch pipe I have in order to get this card working with and without > >> IOMMU. > > > > On PowerPC, I think we automatically enable the IOMMU whenever a DMA > > mask is set that doesn't cover all of the RAM. We could think about > > doing the same thing on ARM64 to make all devices work out of the box. > > > > The ACPI IORT table declares whether you enable IOMMU for a particular > device or not. The placement of IOMMU HW is system specific. The IORT > table gives the IOMMU HW topology to the operating system. This sounds odd. Clearly you need to specify the IOMMU settings for each possible PCI device independent of whether the OS actually uses the IOMMU or not. In a lot of cases, we want to turn it off to get better performance when the driver has set a DMA mask that covers all of RAM, but you also want to enable the IOMMU for debugging purposes or for device assignment if you run virtual machines. The bootloader doesn't know how the device is going to be used, so it cannot define the policy here. Arnd
On Tue, 2015-11-10 at 14:56 -0500, Sinan Kaya wrote: > > On 11/10/2015 2:43 PM, James Bottomley wrote: > > The Issue, as stated by LSI is > > > > Initially set the consistent DMA mask to 32 bit and then change > > it > > to 64 bit mask after allocating RDPQ pools by calling the > > function > > _base_change_consistent_dma_mask. This is to ensure that all the > > upper 32 bits of RDPQ entries's base address to be same. > > > > Need somebody from mpt to confirm that this behavior is still valid for > the recent cards besides altix. OK, you don't seem to be understanding the problem: the Altix isn't a LSI card, it was a SGI platform. It was the platform where we first discovered the issue that a lot of storage cards didn't work because it by default had no memory below 4GB. The reason coherent masks were introduced was initially so the Altix could manufacture and manage a region of memory in the lower 4GB region and we would guarantee to make allocations from it so the storage cards would then work on that platform. I thought the Altix was a historical relic because after they disappeared, there was no other platform with this issue ... until you came along. James
On 11/10/2015 3:05 PM, James Bottomley wrote: > OK, you don't seem to be understanding the problem: the Altix isn't a > LSI card, it was a SGI platform. Got it. > It was the platform where we first > discovered the issue that a lot of storage cards didn't work because it > by default had no memory below 4GB. The reason coherent masks were > introduced was initially so the Altix could manufacture and manage a > region of memory in the lower 4GB region and we would guarantee to make > allocations from it so the storage cards would then work on that > platform. I can't fix the issue if the card cannot do 64 bit DMA when IOMMU is not there. I need IOMMU enabled all the time for this card.
On Tue, 2015-11-10 at 15:26 -0500, Sinan Kaya wrote: > > On 11/10/2015 3:05 PM, James Bottomley wrote: > > OK, you don't seem to be understanding the problem: the Altix isn't a > > LSI card, it was a SGI platform. > > Got it. > > > It was the platform where we first > > discovered the issue that a lot of storage cards didn't work because it > > by default had no memory below 4GB. The reason coherent masks were > > introduced was initially so the Altix could manufacture and manage a > > region of memory in the lower 4GB region and we would guarantee to make > > allocations from it so the storage cards would then work on that > > platform. > > I can't fix the issue if the card cannot do 64 bit DMA when IOMMU is not > there. I need IOMMU enabled all the time for this card. That depends on the limitations of your platform. The Altix only used an iommu to manufacture the coherent memory for the descriptors, but the card itself mostly operated in bypass mode (using 64 bit physical addresses rather than iommu remapped ones), so all accesses except for the few firmware descriptor ones didn't use an iommu. Apparently this was for performance reasons. So, to recap, the card itself *can* do 64 bit DMA. The limitation is that the RDPQ descriptors need to be all in the same region of 4G memory and the way the driver ensures this is to set the 64 bit coherent mask after using the 32 bit one to allocate the RDPQ pools. James
On 11/10/2015 2:56 PM, Arnd Bergmann wrote: >> The ACPI IORT table declares whether you enable IOMMU for a particular >> >device or not. The placement of IOMMU HW is system specific. The IORT >> >table gives the IOMMU HW topology to the operating system. > This sounds odd. Clearly you need to specify the IOMMU settings for each > possible PCI device independent of whether the OS actually uses the IOMMU > or not. There are provisions to have DMA mask in the PCIe host bridge not at the PCIe device level inside IORT table. This setting is specific for each PCIe bus. It is not per PCIe device. It is assumed that the endpoint device driver knows the hardware for PCIe devices. The driver can also query the supported DMA bits by this platform via DMA APIs and will request the correct DMA mask from the DMA subsystem (surprise!). >In a lot of cases, we want to turn it off to get better performance > when the driver has set a DMA mask that covers all of RAM, but you > also want to enable the IOMMU for debugging purposes or for device > assignment if you run virtual machines. The bootloader doesn't know how > the device is going to be used, so it cannot define the policy here. I think we'll end up adding a virtualization option to the UEFI BIOS similar to how Intel platforms work. Based on this switch, we'll end up patching the ACPI table. If I remove the IORT entry, then the device is in coherent mode with device accessing the full RAM range. If I have the IORT table, the device is in IOMMU translation mode. Details are in the IORT spec.
On 11/10/2015 01:13 PM, Arnd Bergmann wrote: > If the mask is 64-bit by default on ARM64, that is a bug that we need > to fix urgently. Can you verify this? I think the mask is 0 by default, because there's no code in ARM64 that actually sets the mask. Take a look at arch_setup_pdev_archdata() in arch/powerpc/kernel/setup-common.c. void arch_setup_pdev_archdata(struct platform_device *pdev) { pdev->archdata.dma_mask = DMA_BIT_MASK(32); pdev->dev.dma_mask = &pdev->archdata.dma_mask; set_dma_ops(&pdev->dev, &dma_direct_ops); } I don't see anything equivalent in arch/arm64 > A lot of PCI devices can only do 32-bit DMA, and we have plenty > of drivers that don't bother setting a mask at all because the 32-bit > mask is the default on all other architectures. In our drivers for 32-bit devices, we have to explicitly set the DMA mask to 32-bits in order to get any DMA working.
On Tuesday 10 November 2015 15:03:59 Timur Tabi wrote: > On 11/10/2015 01:13 PM, Arnd Bergmann wrote: > > If the mask is 64-bit by default on ARM64, that is a bug that we need > > to fix urgently. Can you verify this? > > I think the mask is 0 by default, because there's no code in ARM64 that > actually sets the mask. > > Take a look at arch_setup_pdev_archdata() in > arch/powerpc/kernel/setup-common.c. > > void arch_setup_pdev_archdata(struct platform_device *pdev) > { > pdev->archdata.dma_mask = DMA_BIT_MASK(32); > pdev->dev.dma_mask = &pdev->archdata.dma_mask; > set_dma_ops(&pdev->dev, &dma_direct_ops); > } > > I don't see anything equivalent in arch/arm64 of_dma_configure() sets up an initial DMA mask for 32 bits. The same thing happens for pci_setup_device() in architecture-independent code? > > A lot of PCI devices can only do 32-bit DMA, and we have plenty > > of drivers that don't bother setting a mask at all because the 32-bit > > mask is the default on all other architectures. > > In our drivers for 32-bit devices, we have to explicitly set the DMA > mask to 32-bits in order to get any DMA working. Do you mean PCI devices or platform devices? Maybe the parent bus is lacking a dma-ranges property? Arnd
On 11/10/2015 03:54 PM, Arnd Bergmann wrote: >> In our drivers for 32-bit devices, we have to explicitly set the DMA >> mask to 32-bits in order to get any DMA working. > > Do you mean PCI devices or platform devices? Platform. > Maybe the parent bus is lacking a dma-ranges property? All of this applies only on device-tree platforms. Sinan and I are working on an ACPI server platform. So we never call of_dma_configure(), and we don't have a dma-ranges property.
On Tuesday 10 November 2015 15:58:19 Sinan Kaya wrote: > > On 11/10/2015 2:56 PM, Arnd Bergmann wrote: > >> The ACPI IORT table declares whether you enable IOMMU for a particular > >> >device or not. The placement of IOMMU HW is system specific. The IORT > >> >table gives the IOMMU HW topology to the operating system. > > This sounds odd. Clearly you need to specify the IOMMU settings for each > > possible PCI device independent of whether the OS actually uses the IOMMU > > or not. > > There are provisions to have DMA mask in the PCIe host bridge not at the > PCIe device level inside IORT table. This setting is specific for each > PCIe bus. It is not per PCIe device. Same thing, I meant the bootloader must provide all the information that is needed to use the IOMMU on all PCI devices. I don't care where the IOMMU driver gets that information. Some IOMMUs require programming a bus/device/function specific number into the I/O page tables, and they might not always have the same algorithm to map from the PCI numbers into their own number space. > It is assumed that the endpoint device driver knows the hardware for > PCIe devices. The driver can also query the supported DMA bits by this > platform via DMA APIs and will request the correct DMA mask from the DMA > subsystem (surprise!). I know how the negotiation works. Note that dma_get_required_mask() will only tell you what mask the device needs to access all of memory, while both the device and bus may have additional limitations, and there is not always a solution. > >In a lot of cases, we want to turn it off to get better performance > > when the driver has set a DMA mask that covers all of RAM, but you > > also want to enable the IOMMU for debugging purposes or for device > > assignment if you run virtual machines. The bootloader doesn't know how > > the device is going to be used, so it cannot define the policy here. > > I think we'll end up adding a virtualization option to the UEFI BIOS > similar to how Intel platforms work. Based on this switch, we'll end up > patching the ACPI table. > > If I remove the IORT entry, then the device is in coherent mode with > device accessing the full RAM range. > > If I have the IORT table, the device is in IOMMU translation mode. > > Details are in the IORT spec. I think that would suck a lot more than being slightly out of spec regarding SBSA if you make the low PCI addresses map to the start of RAM. Asking users to select a 'virtualization' option based on what kind of PCI device and kernel version they have is a major hassle. Arnd
On Tuesday 10 November 2015 15:59:18 Timur Tabi wrote: > On 11/10/2015 03:54 PM, Arnd Bergmann wrote: > > >> In our drivers for 32-bit devices, we have to explicitly set the DMA > >> mask to 32-bits in order to get any DMA working. > > > > Do you mean PCI devices or platform devices? > > Platform. > > > Maybe the parent bus is lacking a dma-ranges property? > > All of this applies only on device-tree platforms. Sinan and I are > working on an ACPI server platform. So we never call > of_dma_configure(), and we don't have a dma-ranges property. ACPI must have something else to mark DMA master devices and their capabilities, right? The platform should initialize the dma_mask pointer to a per-device mask and set both the dma_mask and dma_coherent_mask to 32 bits for any DMA master device, and the device driver should override that to be an appropriate mask based on its needs later. Arnd
diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c index c167911..c61c82a 100644 --- a/drivers/scsi/mpt2sas/mpt2sas_base.c +++ b/drivers/scsi/mpt2sas/mpt2sas_base.c @@ -1217,8 +1217,27 @@ _base_config_dma_addressing(struct MPT2SAS_ADAPTER *ioc, struct pci_dev *pdev) ioc->base_add_sg_single = &_base_add_sg_single_32; ioc->sge_size = sizeof(Mpi2SGESimple32_t); ioc->dma_mask = 32; - } else + } else { + /* Try 64 bit, 32 bit failed */ + consistent_dma_mask = DMA_BIT_MASK(64); + + if (sizeof(dma_addr_t) > 4) { + const uint64_t required_mask = + dma_get_required_mask(&pdev->dev); + if ((required_mask > DMA_BIT_MASK(32)) && + !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && + !pci_set_consistent_dma_mask(pdev, + consistent_dma_mask)) { + ioc->base_add_sg_single = + &_base_add_sg_single_64; + ioc->sge_size = sizeof(Mpi2SGESimple64_t); + ioc->dma_mask = 64; + goto out; + } + } + return -ENODEV; + } out: si_meminfo(&s); diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c index d4f1dcd..6dc391c 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -1535,9 +1535,29 @@ _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev) ioc->base_add_sg_single = &_base_add_sg_single_32; ioc->sge_size = sizeof(Mpi2SGESimple32_t); ioc->dma_mask = 32; - } else + } else { + /* Try 64 bit, 32 bit failed */ + consistent_dma_mask = DMA_BIT_MASK(64); + if (sizeof(dma_addr_t) > 4) { + const uint64_t required_mask = + dma_get_required_mask(&pdev->dev); + int consistent_mask = + pci_set_consistent_dma_mask(pdev, + consistent_dma_mask); + + if ((required_mask > DMA_BIT_MASK(32)) && + !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && + !consistent_mask) { + ioc->base_add_sg_single = + &_base_add_sg_single_64; + ioc->sge_size = sizeof(Mpi2SGESimple64_t); + ioc->dma_mask = 64; + goto out; + } + } return -ENODEV; + } out: si_meminfo(&s); pr_info(MPT3SAS_FMT
Current code gives up when 32 bit DMA is not supported. This problem has been observed on systems without any memory below 4 gig. This patch tests 64 bit support before bailing out to find a working combination. Signed-off-by: Sinan Kaya <okaya@codeaurora.org> --- drivers/scsi/mpt2sas/mpt2sas_base.c | 21 ++++++++++++++++++++- drivers/scsi/mpt3sas/mpt3sas_base.c | 22 +++++++++++++++++++++- 2 files changed, 41 insertions(+), 2 deletions(-)