diff mbox

[V2,1/3] scsi: mptxsas: try 64 bit DMA when 32 bit DMA fails

Message ID 1447034266-28003-2-git-send-email-okaya@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Sinan Kaya Nov. 9, 2015, 1:57 a.m. UTC
Current code gives up when 32 bit DMA is not supported.
This problem has been observed on systems without any
memory below 4 gig.

This patch tests 64 bit support before bailing out to find
a working combination.

Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
---
 drivers/scsi/mpt2sas/mpt2sas_base.c | 21 ++++++++++++++++++++-
 drivers/scsi/mpt3sas/mpt3sas_base.c | 22 +++++++++++++++++++++-
 2 files changed, 41 insertions(+), 2 deletions(-)

Comments

Hannes Reinecke Nov. 9, 2015, 7:09 a.m. UTC | #1
On 11/09/2015 02:57 AM, Sinan Kaya wrote:
> Current code gives up when 32 bit DMA is not supported.
> This problem has been observed on systems without any
> memory below 4 gig.
> 
> This patch tests 64 bit support before bailing out to find
> a working combination.
> 
That feels decidedly odd.

Why do you probe for 64bit if 32bit fails?
Typically it's the other way round, on the grounds that 64bit DMA
should be preferred over 32bit.
Can you explain why it needs to be done the other way round here?

Cheers,

Hannes
Arnd Bergmann Nov. 9, 2015, 8:59 a.m. UTC | #2
On Monday 09 November 2015 08:09:39 Hannes Reinecke wrote:
> On 11/09/2015 02:57 AM, Sinan Kaya wrote:
> > Current code gives up when 32 bit DMA is not supported.
> > This problem has been observed on systems without any
> > memory below 4 gig.
> > 
> > This patch tests 64 bit support before bailing out to find
> > a working combination.
> > 
> That feels decidedly odd.
> 
> Why do you probe for 64bit if 32bit fails?

32-bit DMA mask on PCI cannot fail, we rely on that in all sorts
of places. If the machine has all of its RAM visible above 4GB
PCI bus addresses, it must use an IOMMU.

> Typically it's the other way round, on the grounds that 64bit DMA
> should be preferred over 32bit.
> Can you explain why it needs to be done the other way round here?

Something else is odd here, the driver already checks for
dma_get_required_mask(), which will return the smallest mask
that fits all of RAM. If the machine has any memory above 4GB,
it already uses the 64-bit mask, and only falls back to
the 32-bit mask if that fails or if all memory fits within the
first 4GB.

So both the description and the patch are wrong. :(

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sinan Kaya Nov. 9, 2015, 2 p.m. UTC | #3
On 11/9/2015 2:09 AM, Hannes Reinecke wrote:
> On 11/09/2015 02:57 AM, Sinan Kaya wrote:
>> Current code gives up when 32 bit DMA is not supported.
>> This problem has been observed on systems without any
>> memory below 4 gig.
>>
>> This patch tests 64 bit support before bailing out to find
>> a working combination.
>>
> That feels decidedly odd.
>
> Why do you probe for 64bit if 32bit fails?
> Typically it's the other way round, on the grounds that 64bit DMA
> should be preferred over 32bit.
> Can you explain why it needs to be done the other way round here?
>
> Cheers,
>
> Hannes
>

The platform does not have any memory below 4G. So, 32 bit DMA is not 
possible. I'm trying to use 64 bit DMA instead since both the platform 
and hardware supports it. Current code will not try 64 bit DMA if 32 bit 
DMA is not working.
Sinan Kaya Nov. 9, 2015, 2:07 p.m. UTC | #4
On 11/9/2015 3:59 AM, Arnd Bergmann wrote:
> On Monday 09 November 2015 08:09:39 Hannes Reinecke wrote:
>> On 11/09/2015 02:57 AM, Sinan Kaya wrote:
>>> Current code gives up when 32 bit DMA is not supported.
>>> This problem has been observed on systems without any
>>> memory below 4 gig.
>>>
>>> This patch tests 64 bit support before bailing out to find
>>> a working combination.
>>>
>> That feels decidedly odd.
>>
>> Why do you probe for 64bit if 32bit fails?
>
> 32-bit DMA mask on PCI cannot fail, we rely on that in all sorts
> of places. If the machine has all of its RAM visible above 4GB
> PCI bus addresses, it must use an IOMMU.

Can you be specific? PCIe does not have this limitation. It supports 32 
bit and 64 bit TLPs.

I have not seen any limitation so far in the OS either.

Using IOMMU is fine but not required if the endpoint is a true 64 bit 
supporting endpoint. This endpoint supports 64bit too.

>
>> Typically it's the other way round, on the grounds that 64bit DMA
>> should be preferred over 32bit.
>> Can you explain why it needs to be done the other way round here?
>
> Something else is odd here, the driver already checks for
> dma_get_required_mask(), which will return the smallest mask
> that fits all of RAM. If the machine has any memory above 4GB,
> it already uses the 64-bit mask, and only falls back to
> the 32-bit mask if that fails or if all memory fits within the
> first 4GB.
>

I'll add some prints in the code to get to the bottom of it. I think the 
code is checking more than just the required mask and failing in one of 
the other conditions. At least that DMA comparison code was more than 
what I have ever seen.


> So both the description and the patch are wrong. :(
>
> 	Arnd
>
Arnd Bergmann Nov. 9, 2015, 2:33 p.m. UTC | #5
On Monday 09 November 2015 09:07:36 Sinan Kaya wrote:
> 
> On 11/9/2015 3:59 AM, Arnd Bergmann wrote:
> > On Monday 09 November 2015 08:09:39 Hannes Reinecke wrote:
> >> On 11/09/2015 02:57 AM, Sinan Kaya wrote:
> >>> Current code gives up when 32 bit DMA is not supported.
> >>> This problem has been observed on systems without any
> >>> memory below 4 gig.
> >>>
> >>> This patch tests 64 bit support before bailing out to find
> >>> a working combination.
> >>>
> >> That feels decidedly odd.
> >>
> >> Why do you probe for 64bit if 32bit fails?
> >
> > 32-bit DMA mask on PCI cannot fail, we rely on that in all sorts
> > of places. If the machine has all of its RAM visible above 4GB
> > PCI bus addresses, it must use an IOMMU.
> 
> Can you be specific? PCIe does not have this limitation. It supports 32 
> bit and 64 bit TLPs.
> 
> I have not seen any limitation so far in the OS either.

See Documentation/DMA-API-HOWTO.txt

All PCI devices start out with a 32-bit DMA mask, and Linux assumes it
can always fall back to a 32-bit mask if a smaller mask (needed for
some devices that only support DMA on a subset of the 4GB) or larger
mask (needed to address high memory but can fail when the PCI host
does not support it) cannot be set.

> Using IOMMU is fine but not required if the endpoint is a true 64 bit 
> supporting endpoint. This endpoint supports 64bit too.

There are two aspects here:

a) setting a 32-bit mask should not have failed. Any device that actually
   needs 32-bit DMA will make the same call and the platform must
   guarantee that this works. If you have a broken platform that can't
   do this, complain to your board vendor so they wire up the RAM
   properly, with at least the first 2GB visible to low PCI bus
   addresses.

b) If the platform has any memory above 4GB and the supports high DMA,
   it should never have asked for the 32-bit mask before trying the
   64-bit mask first. This is only a performance optimization, but the
   driver seems to do the right thing, so I assume there is something
   wrong with the platform code.

> >> Typically it's the other way round, on the grounds that 64bit DMA
> >> should be preferred over 32bit.
> >> Can you explain why it needs to be done the other way round here?
> >
> > Something else is odd here, the driver already checks for
> > dma_get_required_mask(), which will return the smallest mask
> > that fits all of RAM. If the machine has any memory above 4GB,
> > it already uses the 64-bit mask, and only falls back to
> > the 32-bit mask if that fails or if all memory fits within the
> > first 4GB.
> >
> 
> I'll add some prints in the code to get to the bottom of it. I think the 
> code is checking more than just the required mask and failing in one of 
> the other conditions. At least that DMA comparison code was more than 
> what I have ever seen.

Ok. That should take care of b) above, but we still have a bug with a)
that will come back to bite you if you don't address it right.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sinan Kaya Nov. 9, 2015, 11:22 p.m. UTC | #6
On 11/9/2015 9:33 AM, Arnd Bergmann wrote:
> On Monday 09 November 2015 09:07:36 Sinan Kaya wrote:
>>
>> On 11/9/2015 3:59 AM, Arnd Bergmann wrote:
>>> On Monday 09 November 2015 08:09:39 Hannes Reinecke wrote:
>>>> On 11/09/2015 02:57 AM, Sinan Kaya wrote:
>>>>> Current code gives up when 32 bit DMA is not supported.
>>>>> This problem has been observed on systems without any
>>>>> memory below 4 gig.
>>>>>
>>>>> This patch tests 64 bit support before bailing out to find
>>>>> a working combination.
>>>>>
>>>> That feels decidedly odd.
>>>>
>>>> Why do you probe for 64bit if 32bit fails?
>>>
>>> 32-bit DMA mask on PCI cannot fail, we rely on that in all sorts
>>> of places. If the machine has all of its RAM visible above 4GB
>>> PCI bus addresses, it must use an IOMMU.
>>
>> Can you be specific? PCIe does not have this limitation. It supports 32
>> bit and 64 bit TLPs.
>>
>> I have not seen any limitation so far in the OS either.
>
> See Documentation/DMA-API-HOWTO.txt
>
> All PCI devices start out with a 32-bit DMA mask, and Linux assumes it
> can always fall back to a 32-bit mask if a smaller mask (needed for
> some devices that only support DMA on a subset of the 4GB) or larger
> mask (needed to address high memory but can fail when the PCI host
> does not support it) cannot be set.
>
>> Using IOMMU is fine but not required if the endpoint is a true 64 bit
>> supporting endpoint. This endpoint supports 64bit too.
>
> There are two aspects here:
>
> a) setting a 32-bit mask should not have failed. Any device that actually
>     needs 32-bit DMA will make the same call and the platform must
>     guarantee that this works. If you have a broken platform that can't
>     do this, complain to your board vendor so they wire up the RAM
>     properly, with at least the first 2GB visible to low PCI bus
>     addresses.
>
> b) If the platform has any memory above 4GB and the supports high DMA,
>     it should never have asked for the 32-bit mask before trying the
>     64-bit mask first. This is only a performance optimization, but the
>     driver seems to do the right thing, so I assume there is something
>     wrong with the platform code.
>
>>>> Typically it's the other way round, on the grounds that 64bit DMA
>>>> should be preferred over 32bit.
>>>> Can you explain why it needs to be done the other way round here?
>>>
>>> Something else is odd here, the driver already checks for
>>> dma_get_required_mask(), which will return the smallest mask
>>> that fits all of RAM. If the machine has any memory above 4GB,
>>> it already uses the 64-bit mask, and only falls back to
>>> the 32-bit mask if that fails or if all memory fits within the
>>> first 4GB.
>>>
>>
>> I'll add some prints in the code to get to the bottom of it. I think the
>> code is checking more than just the required mask and failing in one of
>> the other conditions. At least that DMA comparison code was more than
>> what I have ever seen.
>
> Ok. That should take care of b) above, but we still have a bug with a)
> that will come back to bite you if you don't address it right.
>
> 	Arnd
>

B should have worked and it doesn't. B works for other drivers but not 
for this particular one.

As you said,
"it should never have asked for the 32-bit mask before trying the
64-bit mask first."

this is the problem.

This HW doesn't have support for a. If I need a, I am required to use 
IOMMU. Here is what I found.

mpt2sas version 20.100.00.00 loaded
sizeof(dma_addr_t):8
ioc->dma_mask :0
dma_get_required_mask(&pdev->dev) :7fffffffff
mpt2sas0: no suitable DMA mask for 0002:01:00.0
mpt2sas0: failure at 
drivers/scsi/mpt2sas/mpt2sas_scsih.c:8498/_scsih_probe()!


Here is the code.

if (ioc->dma_mask)
	consistent_dma_mask = DMA_BIT_MASK(64);
else
	consistent_dma_mask = DMA_BIT_MASK(32); 	<-- why here?

if (sizeof(dma_addr_t) > 4) {			<-- true
	const uint64_t required_mask =
	    dma_get_required_mask(&pdev->dev);
	if ((required_mask > DMA_BIT_MASK(32)) &&	<-- true
	    !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && <-- true
	    !pci_set_consistent_dma_mask(pdev, consistent_dma_mask)) { <-- false
		ioc->base_add_sg_single = &_base_add_sg_single_64;
		ioc->sge_size = sizeof(Mpi2SGESimple64_t);
		ioc->dma_mask = 64;
		goto out;
	}
}

ioc->dma_mask is 0 and the driver is trying to use 32 bit even though 64 
bit supported by the platform.

I think the proper fix is to pass the required_mask back to 
consistent_dma_mask rather than using ioc->dma_mask and guessing.

pci_set_consistent_dma_mask(pdev, required_mask)

My code was just a band aid for broken code.

The driver worked after that changing the above line only.

mpt2sas_version_20.100.00.00_loaded
sizeof(dma_addr_t) :8
ioc->dma_mask :0
dma_get_required_mask(&pdev->dev) :7fffffffff
mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (16411112 kB)
mpt2sas0: MSI-X vectors supported: 1, no of cores: 24, max_msix_vectors: 8
mpt2sas0: IO-APIC enabled: IRQ 105
mpt2sas0: iomem(0x00000a01005c0000), mapped(0xffff0000008d8000), size(16384)
mpt2sas0: ioport(0x0000000000000000), size(0)
mpt2sas0: Allocated physical memory: size(3392 kB)
mpt2sas0: Current Controller Queue Depth(1483), Max Controller Queue 
Depth(1720)
mpt2sas0: Scatter Gather Elements per IO(128)
mpt2sas0: LSISAS2004: FWVersion(17.00.01.00), ChipRevision(0x03), 
BiosVersion(07.33.0
mpt2sas0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot 
Buffer,Diag Trac
scsi host0: Fusion MPT SAS Host
mpt2sas0: sending port enable !!
mpt2sas0: host_add: handle(0x0001), sas_addr(0x500062b0002de4f0), phys(8)
mpt2sas0: port enable: SUCCESS
scsi 0:0:0:0: Direct-Access     ATA      Samsung SSD 845D 3X3Q PQ: 0 ANSI: 6
scsi 0:0:0:0: SATA: handle(0x0009), sas_addr(0x4433221101000000), 
phy(1), device_name
Timur Tabi Nov. 9, 2015, 11:29 p.m. UTC | #7
On 11/09/2015 05:22 PM, Sinan Kaya wrote:
>
> if (ioc->dma_mask)
>      consistent_dma_mask = DMA_BIT_MASK(64);
> else
>      consistent_dma_mask = DMA_BIT_MASK(32);     <-- why here?

So this change is from this patch:

	http://permalink.gmane.org/gmane.linux.kernel/1759343

Note that this was discussed back in February:

	https://lkml.org/lkml/2015/2/20/2
Arnd Bergmann Nov. 10, 2015, 8:38 a.m. UTC | #8
On Monday 09 November 2015 18:22:22 Sinan Kaya wrote:
> On 11/9/2015 9:33 AM, Arnd Bergmann wrote:
> > On Monday 09 November 2015 09:07:36 Sinan Kaya wrote:
> >> On 11/9/2015 3:59 AM, Arnd Bergmann wrote:
> 
> ioc->dma_mask is 0 and the driver is trying to use 32 bit even though 64 
> bit supported by the platform.

Ok, makes sense.

> I think the proper fix is to pass the required_mask back to 
> consistent_dma_mask rather than using ioc->dma_mask and guessing.
> 
> pci_set_consistent_dma_mask(pdev, required_mask)
> 
> My code was just a band aid for broken code.
> 

No, as Timur found, the driver is correct and it intentionally
sets the 32-bit mask, and that is guaranteed to work on all sane
hardware. Don't change the driver but find a better platform for
your workload, or talk to the people that are responsible for
the platform and get them to fix it.

If the platform also doesn't have an IOMMU, you can probably work
around it by setting up the dma-ranges property of the PCI host
to map the low PCI addresses to the start of RAM. This will also
require changes in the bootloader to set up the PCI outbound translation,
and it will require implementing the DMA offset on ARM64, which I was
hoping to avoid.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sinan Kaya Nov. 10, 2015, 4:06 p.m. UTC | #9
On 11/10/2015 3:38 AM, Arnd Bergmann wrote:
 > No, as Timur found, the driver is correct and it intentionally
> sets the 32-bit mask, and that is guaranteed to work on all sane
> hardware. Don't change the driver but find a better platform for
> your workload, or talk to the people that are responsible for
> the platform and get them to fix it.

Platform does have an IOMMU. No issues there. I am trying to clean out 
the patch pipe I have in order to get this card working with and without 
IOMMU.

>
> If the platform also doesn't have an IOMMU, you can probably work
> around it by setting up the dma-ranges property of the PCI host
> to map the low PCI addresses to the start of RAM. This will also
> require changes in the bootloader to set up the PCI outbound translation,
> and it will require implementing the DMA offset on ARM64, which I was
> hoping to avoid.

 From the email thread, it looks like this was introduced to support 
some legacy card that has 64 bit addressing limitations and is being 
carried around ("rotted") since then.

I'm the second guy after the powerpc architecture complaining about the 
very same issue. Any red flags?

I can't change the address map for PCIe. SBSA requires all inbound PCIe 
addresses to be non-translated.

I'll just have to stick with IOMMU for this card.
Arnd Bergmann Nov. 10, 2015, 4:47 p.m. UTC | #10
On Tuesday 10 November 2015 11:06:40 Sinan Kaya wrote:
> On 11/10/2015 3:38 AM, Arnd Bergmann wrote:
>  > No, as Timur found, the driver is correct and it intentionally
> > sets the 32-bit mask, and that is guaranteed to work on all sane
> > hardware. Don't change the driver but find a better platform for
> > your workload, or talk to the people that are responsible for
> > the platform and get them to fix it.
> 
> Platform does have an IOMMU. No issues there. I am trying to clean out 
> the patch pipe I have in order to get this card working with and without 
> IOMMU.

On PowerPC, I think we automatically enable the IOMMU whenever a DMA
mask is set that doesn't cover all of the RAM. We could think about
doing the same thing on ARM64 to make all devices work out of the box.

> > If the platform also doesn't have an IOMMU, you can probably work
> > around it by setting up the dma-ranges property of the PCI host
> > to map the low PCI addresses to the start of RAM. This will also
> > require changes in the bootloader to set up the PCI outbound translation,
> > and it will require implementing the DMA offset on ARM64, which I was
> > hoping to avoid.
> 
>  From the email thread, it looks like this was introduced to support 
> some legacy card that has 64 bit addressing limitations and is being 
> carried around ("rotted") since then.
> 
> I'm the second guy after the powerpc architecture complaining about the 
> very same issue. Any red flags?

What BenH was worried about here is that the driver sets different masks
for streaming and coherent mappings, which is indeed a worry that
could hit us on ARM as well, but I suppose we'll have to deal with
that in platform code.

Setting both masks to 32-bit is something that a lot of drivers do,
and without IOMMU enabled, you'd hit the same bug on all of them.
 
> I can't change the address map for PCIe. SBSA requires all inbound PCIe 
> addresses to be non-translated.

What about changing the memory map? I suspect there will be more
problems for you in the future when all of your RAM is at high
addresses.  Is this something you could fix in the bootloader by
moving the first 2GB to a different CPU physical address?

> I'll just have to stick with IOMMU for this card.

Ok. But how do you currently decide whether to use the IOMMU or not?

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Timur Tabi Nov. 10, 2015, 5 p.m. UTC | #11
On 11/10/2015 10:47 AM, Arnd Bergmann wrote:
> What BenH was worried about here is that the driver sets different masks
> for streaming and coherent mappings, which is indeed a worry that
> could hit us on ARM as well, but I suppose we'll have to deal with
> that in platform code.
>
> Setting both masks to 32-bit is something that a lot of drivers do,
> and without IOMMU enabled, you'd hit the same bug on all of them.

Also note that I think that on PowerPC, the mask is set to 32 by default 
for all devices.  I don't think we do that on ARM64.  So on PowerPC, 
some drivers get away with not explicitly setting the mask.
Sinan Kaya Nov. 10, 2015, 5:19 p.m. UTC | #12
On 11/10/2015 11:47 AM, Arnd Bergmann wrote:
> On Tuesday 10 November 2015 11:06:40 Sinan Kaya wrote:
>> On 11/10/2015 3:38 AM, Arnd Bergmann wrote:
>>   > No, as Timur found, the driver is correct and it intentionally
>>> sets the 32-bit mask, and that is guaranteed to work on all sane
>>> hardware. Don't change the driver but find a better platform for
>>> your workload, or talk to the people that are responsible for
>>> the platform and get them to fix it.
>>
>> Platform does have an IOMMU. No issues there. I am trying to clean out
>> the patch pipe I have in order to get this card working with and without
>> IOMMU.
>
> On PowerPC, I think we automatically enable the IOMMU whenever a DMA
> mask is set that doesn't cover all of the RAM. We could think about
> doing the same thing on ARM64 to make all devices work out of the box.
>

The ACPI IORT table declares whether you enable IOMMU for a particular 
device or not. The placement of IOMMU HW is system specific. The IORT 
table gives the IOMMU HW topology to the operating system.

>>> If the platform also doesn't have an IOMMU, you can probably work
>>> around it by setting up the dma-ranges property of the PCI host
>>> to map the low PCI addresses to the start of RAM. This will also
>>> require changes in the bootloader to set up the PCI outbound translation,
>>> and it will require implementing the DMA offset on ARM64, which I was
>>> hoping to avoid.
>>
>>   From the email thread, it looks like this was introduced to support
>> some legacy card that has 64 bit addressing limitations and is being
>> carried around ("rotted") since then.
>>
>> I'm the second guy after the powerpc architecture complaining about the
>> very same issue. Any red flags?
>
> What BenH was worried about here is that the driver sets different masks
> for streaming and coherent mappings, which is indeed a worry that
> could hit us on ARM as well, but I suppose we'll have to deal with
> that in platform code.
>
> Setting both masks to 32-bit is something that a lot of drivers do,
> and without IOMMU enabled, you'd hit the same bug on all of them.
>

Maybe, maybe not. This is the only card that I had problems with.

>> I can't change the address map for PCIe. SBSA requires all inbound PCIe
>> addresses to be non-translated.
>
> What about changing the memory map? I suspect there will be more
> problems for you in the future when all of your RAM is at high
> addresses.  Is this something you could fix in the bootloader by
> moving the first 2GB to a different CPU physical address?

I'm thinking about this.

>
>> I'll just have to stick with IOMMU for this card.
>
> Ok. But how do you currently decide whether to use the IOMMU or not?
>

ACPI table. I wanted to get this fix in so that all operating systems 
whether they have IOMMU driver enabled or not would work.

> 	Arnd
>
James Bottomley Nov. 10, 2015, 6:27 p.m. UTC | #13
On Tue, 2015-11-10 at 12:19 -0500, Sinan Kaya wrote:
> On 11/10/2015 11:47 AM, Arnd Bergmann wrote:
> > On Tuesday 10 November 2015 11:06:40 Sinan Kaya wrote:
> >> On 11/10/2015 3:38 AM, Arnd Bergmann wrote:
> >>   From the email thread, it looks like this was introduced to support
> >> some legacy card that has 64 bit addressing limitations and is being
> >> carried around ("rotted") since then.
> >>
> >> I'm the second guy after the powerpc architecture complaining about the
> >> very same issue. Any red flags?
> >
> > What BenH was worried about here is that the driver sets different masks
> > for streaming and coherent mappings, which is indeed a worry that
> > could hit us on ARM as well, but I suppose we'll have to deal with
> > that in platform code.
> >
> > Setting both masks to 32-bit is something that a lot of drivers do,
> > and without IOMMU enabled, you'd hit the same bug on all of them.
> >
> 
> Maybe, maybe not. This is the only card that I had problems with.

Your characterisation of "some legacy card" isn't entirely correct.
Just to clarify how this happens, most I/O cards today are intelligent
offload engines which means they have some type of embedded CPU (it can
even be a specially designed asic).  This CPU is driven by firmware
which is mostly (but not always) in the machine language of the CPU.
DMA transfers are sometimes run by this CPU, but mostly handed off to a
separate offload engine.  When the board gets revised, it's often easier
to update the offload engine to 64 bits and keep the CPU at 32 (or even
16) bits.  This means that all the internal addresses in the firmware
are 32 bit only.  As I read the comments in the original thread, it
looks like the mpt people tried to mitigate this by using segment
registers for external addresses firmware uses ... that's why they say
that they don't have to have all the addresses in DMA32 ... they just
need the upper 32 bits to be constant so they can correctly program the
segment register.  Unfortunately, we have no way to parametrise this to
the DMA allocation code.

You'll find the same thing with Adaptec SPI cards.  Their route to 64
bits was via an initial 39 bit extension that had them layering the
additional 7 bits into the unused lower region of the page descriptors
for the firmware (keeping the actual pointers to DMA at 32 bits because
they're always parametrised as address, offset, length and the address
is always a 4k page).

Eventually, everything will rev to 64 bits and this problem will go
away, but, as I suspect you know, it takes time for the embedded world
to get to where everyone else already is.

As Arnd said, if you failed to allow for this in your platform, then
oops, just don't use the card.  I think this solution would be better
than trying to get the driver to work out which cards can support 64 bit
firmware descriptors and only failing on your platform for those that
can't.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Nov. 10, 2015, 7:13 p.m. UTC | #14
On Tuesday 10 November 2015 11:00:59 Timur Tabi wrote:
> On 11/10/2015 10:47 AM, Arnd Bergmann wrote:
> > What BenH was worried about here is that the driver sets different masks
> > for streaming and coherent mappings, which is indeed a worry that
> > could hit us on ARM as well, but I suppose we'll have to deal with
> > that in platform code.
> >
> > Setting both masks to 32-bit is something that a lot of drivers do,
> > and without IOMMU enabled, you'd hit the same bug on all of them.
> 
> Also note that I think that on PowerPC, the mask is set to 32 by default 
> for all devices.  I don't think we do that on ARM64.  So on PowerPC, 
> some drivers get away with not explicitly setting the mask.
> 

If the mask is 64-bit by default on ARM64, that is a bug that we need
to fix urgently. Can you verify this?

A lot of PCI devices can only do 32-bit DMA, and we have plenty
of drivers that don't bother setting a mask at all because the 32-bit
mask is the default on all other architectures.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sinan Kaya Nov. 10, 2015, 7:14 p.m. UTC | #15
On 11/10/2015 1:27 PM, James Bottomley wrote:
> On Tue, 2015-11-10 at 12:19 -0500, Sinan Kaya wrote:
>> On 11/10/2015 11:47 AM, Arnd Bergmann wrote:
>>> On Tuesday 10 November 2015 11:06:40 Sinan Kaya wrote:
>>>> On 11/10/2015 3:38 AM, Arnd Bergmann wrote:
>>>>    From the email thread, it looks like this was introduced to support
>>>> some legacy card that has 64 bit addressing limitations and is being
>>>> carried around ("rotted") since then.
>>>>
>>>> I'm the second guy after the powerpc architecture complaining about the
>>>> very same issue. Any red flags?
>>>
>>> What BenH was worried about here is that the driver sets different masks
>>> for streaming and coherent mappings, which is indeed a worry that
>>> could hit us on ARM as well, but I suppose we'll have to deal with
>>> that in platform code.
>>>
>>> Setting both masks to 32-bit is something that a lot of drivers do,
>>> and without IOMMU enabled, you'd hit the same bug on all of them.
>>>
>>
>> Maybe, maybe not. This is the only card that I had problems with.
>
> Your characterisation of "some legacy card" isn't entirely correct.
> Just to clarify how this happens, most I/O cards today are intelligent
> offload engines which means they have some type of embedded CPU (it can
> even be a specially designed asic).  This CPU is driven by firmware
> which is mostly (but not always) in the machine language of the CPU.
> DMA transfers are sometimes run by this CPU, but mostly handed off to a
> separate offload engine.  When the board gets revised, it's often easier
> to update the offload engine to 64 bits and keep the CPU at 32 (or even
> 16) bits.  This means that all the internal addresses in the firmware
> are 32 bit only.  As I read the comments in the original thread, it
> looks like the mpt people tried to mitigate this by using segment
> registers for external addresses firmware uses ... that's why they say
> that they don't have to have all the addresses in DMA32 ... they just
> need the upper 32 bits to be constant so they can correctly program the
> segment register.  Unfortunately, we have no way to parametrise this to
> the DMA allocation code.
>
> You'll find the same thing with Adaptec SPI cards.  Their route to 64
> bits was via an initial 39 bit extension that had them layering the
> additional 7 bits into the unused lower region of the page descriptors
> for the firmware (keeping the actual pointers to DMA at 32 bits because
> they're always parametrised as address, offset, length and the address
> is always a 4k page).
>
> Eventually, everything will rev to 64 bits and this problem will go
> away, but, as I suspect you know, it takes time for the embedded world
> to get to where everyone else already is.
>
> As Arnd said, if you failed to allow for this in your platform, then
> oops, just don't use the card.  I think this solution would be better
> than trying to get the driver to work out which cards can support 64 bit
> firmware descriptors and only failing on your platform for those that
> can't.
>
> James
>
>

James,
I was referring to this conversation here.

https://lkml.org/lkml/2015/2/20/31

"The aic79xx hardware problem was that the DMA engine could address the 
whole of memory (it had two address modes, a 39 bit one and a 64 bit 
one) but the script engine that runs the mailboxes only had a 32 bit 
activation register (the activating write points at the physical address 
of the script to begin executing)."

The fact that LSI SAS 92118i is working with 64 bit addresses suggests 
me that this problem is already solved.  I have not hit any kind of 
regressions with 93xx and 92xx families under load in a true 64 bit 
environment. I am only mentioning this based on my testing exposure.

Another comment here from you.
https://lkml.org/lkml/2015/4/2/28

"Well, it was originally a hack for altix, because they had no regions
below 4GB and had to specifically manufacture them.  As you know, in
Linux, if Intel doesn't need it, no-one cares and the implementation
bitrots."

Maybe, it is time to fix the code for more recent (even decent) hardware?

Sinan
James Bottomley Nov. 10, 2015, 7:43 p.m. UTC | #16
On Tue, 2015-11-10 at 14:14 -0500, Sinan Kaya wrote:
> 
> On 11/10/2015 1:27 PM, James Bottomley wrote:
> > On Tue, 2015-11-10 at 12:19 -0500, Sinan Kaya wrote:
> >> On 11/10/2015 11:47 AM, Arnd Bergmann wrote:
> >>> On Tuesday 10 November 2015 11:06:40 Sinan Kaya wrote:
> >>>> On 11/10/2015 3:38 AM, Arnd Bergmann wrote:
> >>>>    From the email thread, it looks like this was introduced to support
> >>>> some legacy card that has 64 bit addressing limitations and is being
> >>>> carried around ("rotted") since then.
> >>>>
> >>>> I'm the second guy after the powerpc architecture complaining about the
> >>>> very same issue. Any red flags?
> >>>
> >>> What BenH was worried about here is that the driver sets different masks
> >>> for streaming and coherent mappings, which is indeed a worry that
> >>> could hit us on ARM as well, but I suppose we'll have to deal with
> >>> that in platform code.
> >>>
> >>> Setting both masks to 32-bit is something that a lot of drivers do,
> >>> and without IOMMU enabled, you'd hit the same bug on all of them.
> >>>
> >>
> >> Maybe, maybe not. This is the only card that I had problems with.
> >
> > Your characterisation of "some legacy card" isn't entirely correct.
> > Just to clarify how this happens, most I/O cards today are intelligent
> > offload engines which means they have some type of embedded CPU (it can
> > even be a specially designed asic).  This CPU is driven by firmware
> > which is mostly (but not always) in the machine language of the CPU.
> > DMA transfers are sometimes run by this CPU, but mostly handed off to a
> > separate offload engine.  When the board gets revised, it's often easier
> > to update the offload engine to 64 bits and keep the CPU at 32 (or even
> > 16) bits.  This means that all the internal addresses in the firmware
> > are 32 bit only.  As I read the comments in the original thread, it
> > looks like the mpt people tried to mitigate this by using segment
> > registers for external addresses firmware uses ... that's why they say
> > that they don't have to have all the addresses in DMA32 ... they just
> > need the upper 32 bits to be constant so they can correctly program the
> > segment register.  Unfortunately, we have no way to parametrise this to
> > the DMA allocation code.
> >
> > You'll find the same thing with Adaptec SPI cards.  Their route to 64
> > bits was via an initial 39 bit extension that had them layering the
> > additional 7 bits into the unused lower region of the page descriptors
> > for the firmware (keeping the actual pointers to DMA at 32 bits because
> > they're always parametrised as address, offset, length and the address
> > is always a 4k page).
> >
> > Eventually, everything will rev to 64 bits and this problem will go
> > away, but, as I suspect you know, it takes time for the embedded world
> > to get to where everyone else already is.
> >
> > As Arnd said, if you failed to allow for this in your platform, then
> > oops, just don't use the card.  I think this solution would be better
> > than trying to get the driver to work out which cards can support 64 bit
> > firmware descriptors and only failing on your platform for those that
> > can't.
> >
> > James
> >
> >
> 
> James,
> I was referring to this conversation here.
> 
> https://lkml.org/lkml/2015/2/20/31
> 
> "The aic79xx hardware problem was that the DMA engine could address the 
> whole of memory (it had two address modes, a 39 bit one and a 64 bit 
> one) but the script engine that runs the mailboxes only had a 32 bit 
> activation register (the activating write points at the physical address 
> of the script to begin executing)."
> 
> The fact that LSI SAS 92118i is working with 64 bit addresses suggests 
> me that this problem is already solved.  I have not hit any kind of 
> regressions with 93xx and 92xx families under load in a true 64 bit 
> environment. I am only mentioning this based on my testing exposure.

The Issue, as stated by LSI is

        Initially set the consistent DMA mask to 32 bit and then change
        it
        to 64 bit mask after allocating RDPQ pools by calling the
        function
        _base_change_consistent_dma_mask. This is to ensure that all the
        upper 32 bits of RDPQ entries's base address to be same.

If you set a 64 bit coherent mask before this point, you're benefiting
from being lucky that all the upper 32 bits of the allocations are the
same ... we can't code a driver to rely on luck.  Particularly not when
the failure mode looks like it would be silent and deadly.

> Another comment here from you.
> https://lkml.org/lkml/2015/4/2/28
> 
> "Well, it was originally a hack for altix, because they had no regions
> below 4GB and had to specifically manufacture them.  As you know, in
> Linux, if Intel doesn't need it, no-one cares and the implementation
> bitrots."
> 
> Maybe, it is time to fix the code for more recent (even decent) hardware?

What do you mean "fix the code"?  The code isn't broken, it's
parametrising issues with particular hardware.  There's no software work
around (except allocating memory with the correct characteristics).

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sinan Kaya Nov. 10, 2015, 7:56 p.m. UTC | #17
On 11/10/2015 2:43 PM, James Bottomley wrote:
> The Issue, as stated by LSI is
>
>          Initially set the consistent DMA mask to 32 bit and then change
>          it
>          to 64 bit mask after allocating RDPQ pools by calling the
>          function
>          _base_change_consistent_dma_mask. This is to ensure that all the
>          upper 32 bits of RDPQ entries's base address to be same.
>

Need somebody from mpt to confirm that this behavior is still valid for 
the recent cards besides altix.

> If you set a 64 bit coherent mask before this point, you're benefiting
> from being lucky that all the upper 32 bits of the allocations are the
> same ... we can't code a driver to rely on luck.  Particularly not when
> the failure mode looks like it would be silent and deadly.

Of course nobody wants unreliable code.

I'm wondering if I was just lucky during my testing or the 92xx and 93xx 
hardware supports full 64 bit range. I don't have any insight into what 
the endpoint does or what it is capable of.

>
>> >Another comment here from you.
>> >https://lkml.org/lkml/2015/4/2/28
>> >
>> >"Well, it was originally a hack for altix, because they had no regions
>> >below 4GB and had to specifically manufacture them.  As you know, in
>> >Linux, if Intel doesn't need it, no-one cares and the implementation
>> >bitrots."
>> >
>> >Maybe, it is time to fix the code for more recent (even decent) hardware?
> What do you mean "fix the code"?  The code isn't broken, it's
> parametrising issues with particular hardware.  There's no software work
> around (except allocating memory with the correct characteristics).

Need confirmation. I'm questioning if we are stuck with this behavior 
because of altix or something else. If the latter case, the code could 
have used PCI ID to do something special for it.
Arnd Bergmann Nov. 10, 2015, 7:56 p.m. UTC | #18
On Tuesday 10 November 2015 12:19:33 Sinan Kaya wrote:
> On 11/10/2015 11:47 AM, Arnd Bergmann wrote:
> > On Tuesday 10 November 2015 11:06:40 Sinan Kaya wrote:
> >> On 11/10/2015 3:38 AM, Arnd Bergmann wrote:
> >>   > No, as Timur found, the driver is correct and it intentionally
> >>> sets the 32-bit mask, and that is guaranteed to work on all sane
> >>> hardware. Don't change the driver but find a better platform for
> >>> your workload, or talk to the people that are responsible for
> >>> the platform and get them to fix it.
> >>
> >> Platform does have an IOMMU. No issues there. I am trying to clean out
> >> the patch pipe I have in order to get this card working with and without
> >> IOMMU.
> >
> > On PowerPC, I think we automatically enable the IOMMU whenever a DMA
> > mask is set that doesn't cover all of the RAM. We could think about
> > doing the same thing on ARM64 to make all devices work out of the box.
> >
> 
> The ACPI IORT table declares whether you enable IOMMU for a particular 
> device or not. The placement of IOMMU HW is system specific. The IORT 
> table gives the IOMMU HW topology to the operating system.

This sounds odd. Clearly you need to specify the IOMMU settings for each
possible PCI device independent of whether the OS actually uses the IOMMU
or not. In a lot of cases, we want to turn it off to get better performance
when the driver has set a DMA mask that covers all of RAM, but you
also want to enable the IOMMU for debugging purposes or for device
assignment if you run virtual machines. The bootloader doesn't know how
the device is going to be used, so it cannot define the policy here.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
James Bottomley Nov. 10, 2015, 8:05 p.m. UTC | #19
On Tue, 2015-11-10 at 14:56 -0500, Sinan Kaya wrote:
> 
> On 11/10/2015 2:43 PM, James Bottomley wrote:
> > The Issue, as stated by LSI is
> >
> >          Initially set the consistent DMA mask to 32 bit and then change
> >          it
> >          to 64 bit mask after allocating RDPQ pools by calling the
> >          function
> >          _base_change_consistent_dma_mask. This is to ensure that all the
> >          upper 32 bits of RDPQ entries's base address to be same.
> >
> 
> Need somebody from mpt to confirm that this behavior is still valid for 
> the recent cards besides altix.

OK, you don't seem to be understanding the problem: the Altix isn't a
LSI card, it was a SGI platform.  It was the platform where we first
discovered the issue that a lot of storage cards didn't work because it
by default had no memory below 4GB.  The reason coherent masks were
introduced was initially so the Altix could manufacture and manage a
region of memory in the lower 4GB region and we would guarantee to make
allocations from it so the storage cards would then work on that
platform.

I thought the Altix was a historical relic because after they
disappeared, there was no other platform with this issue ...  until you
came along.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sinan Kaya Nov. 10, 2015, 8:26 p.m. UTC | #20
On 11/10/2015 3:05 PM, James Bottomley wrote:
> OK, you don't seem to be understanding the problem: the Altix isn't a
> LSI card, it was a SGI platform.

Got it.

> It was the platform where we first
> discovered the issue that a lot of storage cards didn't work because it
> by default had no memory below 4GB.  The reason coherent masks were
> introduced was initially so the Altix could manufacture and manage a
> region of memory in the lower 4GB region and we would guarantee to make
> allocations from it so the storage cards would then work on that
> platform.

I can't fix the issue if the card cannot do 64 bit DMA when IOMMU is not 
there. I need IOMMU enabled all the time for this card.
James Bottomley Nov. 10, 2015, 8:35 p.m. UTC | #21
On Tue, 2015-11-10 at 15:26 -0500, Sinan Kaya wrote:
> 
> On 11/10/2015 3:05 PM, James Bottomley wrote:
> > OK, you don't seem to be understanding the problem: the Altix isn't a
> > LSI card, it was a SGI platform.
> 
> Got it.
> 
> > It was the platform where we first
> > discovered the issue that a lot of storage cards didn't work because it
> > by default had no memory below 4GB.  The reason coherent masks were
> > introduced was initially so the Altix could manufacture and manage a
> > region of memory in the lower 4GB region and we would guarantee to make
> > allocations from it so the storage cards would then work on that
> > platform.
> 
> I can't fix the issue if the card cannot do 64 bit DMA when IOMMU is not 
> there. I need IOMMU enabled all the time for this card.

That depends on the limitations of your platform.  The Altix only used
an iommu to manufacture the coherent memory for the descriptors, but the
card itself mostly operated in bypass mode (using 64 bit physical
addresses rather than iommu remapped ones), so all accesses except for
the few firmware descriptor ones didn't use an iommu.  Apparently this
was for performance reasons.

So, to recap, the card itself *can* do 64 bit DMA.  The limitation is
that the RDPQ descriptors need to be all in the same region of 4G memory
and the way the driver ensures this is to set the 64 bit coherent mask
after using the 32 bit one to allocate the RDPQ pools.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sinan Kaya Nov. 10, 2015, 8:58 p.m. UTC | #22
On 11/10/2015 2:56 PM, Arnd Bergmann wrote:
>> The ACPI IORT table declares whether you enable IOMMU for a particular
>> >device or not. The placement of IOMMU HW is system specific. The IORT
>> >table gives the IOMMU HW topology to the operating system.
> This sounds odd. Clearly you need to specify the IOMMU settings for each
> possible PCI device independent of whether the OS actually uses the IOMMU
> or not.

There are provisions to have DMA mask in the PCIe host bridge not at the 
PCIe device level inside IORT table. This setting is specific for each 
PCIe bus. It is not per PCIe device.

It is assumed that the endpoint device driver knows the hardware for 
PCIe devices. The driver can also query the supported DMA bits by this 
platform via DMA APIs and will request the correct DMA mask from the DMA 
subsystem (surprise!).

>In a lot of cases, we want to turn it off to get better performance
> when the driver has set a DMA mask that covers all of RAM, but you
> also want to enable the IOMMU for debugging purposes or for device
> assignment if you run virtual machines. The bootloader doesn't know how
> the device is going to be used, so it cannot define the policy here.

I think we'll end up adding a virtualization option to the UEFI BIOS 
similar to how Intel platforms work. Based on this switch, we'll end up 
patching the ACPI table.

If I remove the IORT entry, then the device is in coherent mode with 
device accessing the full RAM range.

If I have the IORT table, the device is in IOMMU translation mode.

Details are in the IORT spec.
Timur Tabi Nov. 10, 2015, 9:03 p.m. UTC | #23
On 11/10/2015 01:13 PM, Arnd Bergmann wrote:
> If the mask is 64-bit by default on ARM64, that is a bug that we need
> to fix urgently. Can you verify this?

I think the mask is 0 by default, because there's no code in ARM64 that 
actually sets the mask.

Take a look at arch_setup_pdev_archdata() in 
arch/powerpc/kernel/setup-common.c.

void arch_setup_pdev_archdata(struct platform_device *pdev)
{
         pdev->archdata.dma_mask = DMA_BIT_MASK(32);
         pdev->dev.dma_mask = &pdev->archdata.dma_mask;
         set_dma_ops(&pdev->dev, &dma_direct_ops);
}

I don't see anything equivalent in arch/arm64

> A lot of PCI devices can only do 32-bit DMA, and we have plenty
> of drivers that don't bother setting a mask at all because the 32-bit
> mask is the default on all other architectures.

In our drivers for 32-bit devices, we have to explicitly set the DMA 
mask to 32-bits in order to get any DMA working.
Arnd Bergmann Nov. 10, 2015, 9:54 p.m. UTC | #24
On Tuesday 10 November 2015 15:03:59 Timur Tabi wrote:
> On 11/10/2015 01:13 PM, Arnd Bergmann wrote:
> > If the mask is 64-bit by default on ARM64, that is a bug that we need
> > to fix urgently. Can you verify this?
> 
> I think the mask is 0 by default, because there's no code in ARM64 that 
> actually sets the mask.
> 
> Take a look at arch_setup_pdev_archdata() in 
> arch/powerpc/kernel/setup-common.c.
> 
> void arch_setup_pdev_archdata(struct platform_device *pdev)
> {
>          pdev->archdata.dma_mask = DMA_BIT_MASK(32);
>          pdev->dev.dma_mask = &pdev->archdata.dma_mask;
>          set_dma_ops(&pdev->dev, &dma_direct_ops);
> }
> 
> I don't see anything equivalent in arch/arm64

of_dma_configure() sets up an initial DMA mask for 32 bits. The
same thing happens for pci_setup_device() in architecture-independent
code?

> > A lot of PCI devices can only do 32-bit DMA, and we have plenty
> > of drivers that don't bother setting a mask at all because the 32-bit
> > mask is the default on all other architectures.
> 
> In our drivers for 32-bit devices, we have to explicitly set the DMA 
> mask to 32-bits in order to get any DMA working.

Do you mean PCI devices or platform devices?

Maybe the parent bus is lacking a dma-ranges property?

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Timur Tabi Nov. 10, 2015, 9:59 p.m. UTC | #25
On 11/10/2015 03:54 PM, Arnd Bergmann wrote:

>> In our drivers for 32-bit devices, we have to explicitly set the DMA
>> mask to 32-bits in order to get any DMA working.
>
> Do you mean PCI devices or platform devices?

Platform.

> Maybe the parent bus is lacking a dma-ranges property?

All of this applies only on device-tree platforms.  Sinan and I are 
working on an ACPI server platform.  So we never call 
of_dma_configure(), and we don't have a dma-ranges property.
Arnd Bergmann Nov. 10, 2015, 10:06 p.m. UTC | #26
On Tuesday 10 November 2015 15:58:19 Sinan Kaya wrote:
> 
> On 11/10/2015 2:56 PM, Arnd Bergmann wrote:
> >> The ACPI IORT table declares whether you enable IOMMU for a particular
> >> >device or not. The placement of IOMMU HW is system specific. The IORT
> >> >table gives the IOMMU HW topology to the operating system.
> > This sounds odd. Clearly you need to specify the IOMMU settings for each
> > possible PCI device independent of whether the OS actually uses the IOMMU
> > or not.
> 
> There are provisions to have DMA mask in the PCIe host bridge not at the 
> PCIe device level inside IORT table. This setting is specific for each 
> PCIe bus. It is not per PCIe device.

Same thing, I meant the bootloader must provide all the information that
is needed to use the IOMMU on all PCI devices. I don't care where the IOMMU
driver gets that information. Some IOMMUs require programming a
bus/device/function specific number into the I/O page tables, and they
might not always have the same algorithm to map from the PCI numbers
into their own number space.

> It is assumed that the endpoint device driver knows the hardware for 
> PCIe devices. The driver can also query the supported DMA bits by this 
> platform via DMA APIs and will request the correct DMA mask from the DMA 
> subsystem (surprise!).

I know how the negotiation works. Note that dma_get_required_mask()
will only tell you what mask the device needs to access all of memory,
while both the device and bus may have additional limitations, and
there is not always a solution.

> >In a lot of cases, we want to turn it off to get better performance
> > when the driver has set a DMA mask that covers all of RAM, but you
> > also want to enable the IOMMU for debugging purposes or for device
> > assignment if you run virtual machines. The bootloader doesn't know how
> > the device is going to be used, so it cannot define the policy here.
> 
> I think we'll end up adding a virtualization option to the UEFI BIOS 
> similar to how Intel platforms work. Based on this switch, we'll end up 
> patching the ACPI table.
> 
> If I remove the IORT entry, then the device is in coherent mode with 
> device accessing the full RAM range.
> 
> If I have the IORT table, the device is in IOMMU translation mode.
> 
> Details are in the IORT spec.

I think that would suck a lot more than being slightly out of spec
regarding SBSA if you make the low PCI addresses map to the start
of RAM. Asking users to select a 'virtualization' option based on
what kind of PCI device and kernel version they have is a major
hassle.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arnd Bergmann Nov. 10, 2015, 10:08 p.m. UTC | #27
On Tuesday 10 November 2015 15:59:18 Timur Tabi wrote:
> On 11/10/2015 03:54 PM, Arnd Bergmann wrote:
> 
> >> In our drivers for 32-bit devices, we have to explicitly set the DMA
> >> mask to 32-bits in order to get any DMA working.
> >
> > Do you mean PCI devices or platform devices?
> 
> Platform.
> 
> > Maybe the parent bus is lacking a dma-ranges property?
> 
> All of this applies only on device-tree platforms.  Sinan and I are 
> working on an ACPI server platform.  So we never call 
> of_dma_configure(), and we don't have a dma-ranges property.

ACPI must have something else to mark DMA master devices and their
capabilities, right?

The platform should initialize the dma_mask pointer to a per-device
mask and set both the dma_mask and dma_coherent_mask to 32 bits
for any DMA master device, and the device driver should override
that to be an appropriate mask based on its needs later.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/scsi/mpt2sas/mpt2sas_base.c b/drivers/scsi/mpt2sas/mpt2sas_base.c
index c167911..c61c82a 100644
--- a/drivers/scsi/mpt2sas/mpt2sas_base.c
+++ b/drivers/scsi/mpt2sas/mpt2sas_base.c
@@ -1217,8 +1217,27 @@  _base_config_dma_addressing(struct MPT2SAS_ADAPTER *ioc, struct pci_dev *pdev)
 		ioc->base_add_sg_single = &_base_add_sg_single_32;
 		ioc->sge_size = sizeof(Mpi2SGESimple32_t);
 		ioc->dma_mask = 32;
-	} else
+	} else {
+		/* Try 64 bit, 32 bit failed */
+		consistent_dma_mask = DMA_BIT_MASK(64);
+
+		if (sizeof(dma_addr_t) > 4) {
+			const uint64_t required_mask =
+				dma_get_required_mask(&pdev->dev);
+			if ((required_mask > DMA_BIT_MASK(32)) &&
+				!pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) &&
+				!pci_set_consistent_dma_mask(pdev,
+							consistent_dma_mask)) {
+				ioc->base_add_sg_single =
+					&_base_add_sg_single_64;
+				ioc->sge_size = sizeof(Mpi2SGESimple64_t);
+				ioc->dma_mask = 64;
+				goto out;
+			}
+		}
+
 		return -ENODEV;
+	}
 
  out:
 	si_meminfo(&s);
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index d4f1dcd..6dc391c 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -1535,9 +1535,29 @@  _base_config_dma_addressing(struct MPT3SAS_ADAPTER *ioc, struct pci_dev *pdev)
 		ioc->base_add_sg_single = &_base_add_sg_single_32;
 		ioc->sge_size = sizeof(Mpi2SGESimple32_t);
 		ioc->dma_mask = 32;
-	} else
+	} else {
+		/* Try 64 bit, 32 bit failed */
+		consistent_dma_mask = DMA_BIT_MASK(64);
+		if (sizeof(dma_addr_t) > 4) {
+			const uint64_t required_mask =
+				dma_get_required_mask(&pdev->dev);
+			int consistent_mask =
+				pci_set_consistent_dma_mask(pdev,
+							consistent_dma_mask);
+
+			if ((required_mask > DMA_BIT_MASK(32)) &&
+				!pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) &&
+				!consistent_mask) {
+				ioc->base_add_sg_single =
+					&_base_add_sg_single_64;
+				ioc->sge_size = sizeof(Mpi2SGESimple64_t);
+				ioc->dma_mask = 64;
+				goto out;
+			}
+		}
 		return -ENODEV;
 
+	}
  out:
 	si_meminfo(&s);
 	pr_info(MPT3SAS_FMT