diff mbox

iommu/arm-smmu-v3: Set GBPA to abort all transactions

Message ID 1522247980-31892-1-git-send-email-timur@codeaurora.org (mailing list archive)
State New, archived
Headers show

Commit Message

Timur Tabi March 28, 2018, 2:39 p.m. UTC
From: Sameer Goel <sgoel@codeaurora.org>

Set SMMU_GBPA to abort all incoming translations during the SMMU reset
when SMMUEN==0.

This prevents a race condition where a stray DMA from the crashed primary
kernel can try to access an IOVA address as an invalid PA when SMMU is
disabled during reset in the crash kernel.

Signed-off-by: Sameer Goel <sgoel@codeaurora.org>
---
 drivers/iommu/arm-smmu-v3.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Marc Zyngier March 28, 2018, 3 p.m. UTC | #1
On 2018-03-28 15:39, Timur Tabi wrote:
> From: Sameer Goel <sgoel@codeaurora.org>
>
> Set SMMU_GBPA to abort all incoming translations during the SMMU 
> reset
> when SMMUEN==0.
>
> This prevents a race condition where a stray DMA from the crashed 
> primary
> kernel can try to access an IOVA address as an invalid PA when SMMU 
> is
> disabled during reset in the crash kernel.
>
> Signed-off-by: Sameer Goel <sgoel@codeaurora.org>
> ---
>  drivers/iommu/arm-smmu-v3.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/drivers/iommu/arm-smmu-v3.c 
> b/drivers/iommu/arm-smmu-v3.c
> index 3f2f1fc68b52..c04a89310c59 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2458,6 +2458,18 @@ static int arm_smmu_device_reset(struct
> arm_smmu_device *smmu, bool bypass)
>  	if (reg & CR0_SMMUEN)
>  		dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
>
> +	/*
> +	 * Abort all incoming translations. This can happen in a kdump case
> +	 * where SMMU is initialized when a prior DMA is pending. Just
> +	 * disabling the SMMU in this case might result in writes to 
> invalid
> +	 * PAs.
> +	 */
> +	ret = arm_smmu_update_gbpa(smmu, 1, GBPA_ABORT);
> +	if (ret) {
> +		dev_err(smmu->dev, "GBPA not responding to update\n");
> +		return ret;
> +	}
> +
>  	ret = arm_smmu_device_disable(smmu);
>  	if (ret)
>  		return ret;

A tangential question: can we reliably detect that the SMMU already has 
valid mappings, which would indicate that we're in a pretty bad shape 
already by the time we set that bit? For all we know, memory could have 
been corrupted long before we hit this point, and this patch barely 
narrows the window of opportunity.

At the very least, we should emit a warning and taint the kernel (we've 
recently added such measures to the GICv3 driver).

Thanks,

         M.
Will Deacon April 5, 2018, 11:26 a.m. UTC | #2
On Wed, Mar 28, 2018 at 09:39:40AM -0500, Timur Tabi wrote:
> From: Sameer Goel <sgoel@codeaurora.org>
> 
> Set SMMU_GBPA to abort all incoming translations during the SMMU reset
> when SMMUEN==0.
> 
> This prevents a race condition where a stray DMA from the crashed primary
> kernel can try to access an IOVA address as an invalid PA when SMMU is
> disabled during reset in the crash kernel.
> 
> Signed-off-by: Sameer Goel <sgoel@codeaurora.org>
> ---
>  drivers/iommu/arm-smmu-v3.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 3f2f1fc68b52..c04a89310c59 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2458,6 +2458,18 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
>  	if (reg & CR0_SMMUEN)
>  		dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
>  
> +	/*
> +	 * Abort all incoming translations. This can happen in a kdump case
> +	 * where SMMU is initialized when a prior DMA is pending. Just
> +	 * disabling the SMMU in this case might result in writes to invalid
> +	 * PAs.
> +	 */
> +	ret = arm_smmu_update_gbpa(smmu, 1, GBPA_ABORT);
> +	if (ret) {
> +		dev_err(smmu->dev, "GBPA not responding to update\n");
> +		return ret;
> +	}

This needs to be predicated on the disable_bypass option, otherwise I think
it will cause regressions for systems that rely on passthrough.

Will
Goel, Sameer April 11, 2018, 3:54 p.m. UTC | #3
On 4/5/2018 5:26 AM, Will Deacon wrote:
> On Wed, Mar 28, 2018 at 09:39:40AM -0500, Timur Tabi wrote:
>> From: Sameer Goel <sgoel@codeaurora.org>
>>
>> Set SMMU_GBPA to abort all incoming translations during the SMMU reset
>> when SMMUEN==0.
>>
>> This prevents a race condition where a stray DMA from the crashed primary
>> kernel can try to access an IOVA address as an invalid PA when SMMU is
>> disabled during reset in the crash kernel.
>>
>> Signed-off-by: Sameer Goel <sgoel@codeaurora.org>
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 3f2f1fc68b52..c04a89310c59 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -2458,6 +2458,18 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
>>  	if (reg & CR0_SMMUEN)
>>  		dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
>>  
>> +	/*
>> +	 * Abort all incoming translations. This can happen in a kdump case
>> +	 * where SMMU is initialized when a prior DMA is pending. Just
>> +	 * disabling the SMMU in this case might result in writes to invalid
>> +	 * PAs.
>> +	 */
>> +	ret = arm_smmu_update_gbpa(smmu, 1, GBPA_ABORT);
>> +	if (ret) {
>> +		dev_err(smmu->dev, "GBPA not responding to update\n");
>> +		return ret;
>> +	}
> 
> This needs to be predicated on the disable_bypass option, otherwise I think
> it will cause regressions for systems that rely on passthrough.
Ok, I'll make the change.
> 
> Will
>
Goel, Sameer April 11, 2018, 3:58 p.m. UTC | #4
On 3/28/2018 9:00 AM, Marc Zyngier wrote:
> On 2018-03-28 15:39, Timur Tabi wrote:
>> From: Sameer Goel <sgoel@codeaurora.org>
>>
>> Set SMMU_GBPA to abort all incoming translations during the SMMU reset
>> when SMMUEN==0.
>>
>> This prevents a race condition where a stray DMA from the crashed primary
>> kernel can try to access an IOVA address as an invalid PA when SMMU is
>> disabled during reset in the crash kernel.
>>
>> Signed-off-by: Sameer Goel <sgoel@codeaurora.org>
>> ---
>>  drivers/iommu/arm-smmu-v3.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 3f2f1fc68b52..c04a89310c59 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -2458,6 +2458,18 @@ static int arm_smmu_device_reset(struct
>> arm_smmu_device *smmu, bool bypass)
>>      if (reg & CR0_SMMUEN)
>>          dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
>>
>> +    /*
>> +     * Abort all incoming translations. This can happen in a kdump case
>> +     * where SMMU is initialized when a prior DMA is pending. Just
>> +     * disabling the SMMU in this case might result in writes to invalid
>> +     * PAs.
>> +     */
>> +    ret = arm_smmu_update_gbpa(smmu, 1, GBPA_ABORT);
>> +    if (ret) {
>> +        dev_err(smmu->dev, "GBPA not responding to update\n");
>> +        return ret;
>> +    }
>> +
>>      ret = arm_smmu_device_disable(smmu);
>>      if (ret)
>>          return ret;
> 
> A tangential question: can we reliably detect that the SMMU already has valid mappings, which would indicate that we're in a pretty bad shape already by the time we set that bit? For all we know, memory could have been corrupted long before we hit this point, and this patch barely narrows the window of opportunity.
:) Yes that is correct. This only covers the kdump scenario. Trying to get some reliability when booting up the crash kernel. The system is already in a bad state. I don't think that this will happen in a normal scenario. But please point me to the GICv3 change and I'll have a look.
Thanks,
Sameer 
> 
> At the very least, we should emit a warning and taint the kernel (we've recently added such measures to the GICv3 driver).
> 
> Thanks,
> 
>         M.
Marc Zyngier April 11, 2018, 4:54 p.m. UTC | #5
Hi Sammer,

On 11/04/18 16:58, Goel, Sameer wrote:
> 
> 
> On 3/28/2018 9:00 AM, Marc Zyngier wrote:
>> On 2018-03-28 15:39, Timur Tabi wrote:
>>> From: Sameer Goel <sgoel@codeaurora.org>
>>>
>>> Set SMMU_GBPA to abort all incoming translations during the SMMU reset
>>> when SMMUEN==0.
>>>
>>> This prevents a race condition where a stray DMA from the crashed primary
>>> kernel can try to access an IOVA address as an invalid PA when SMMU is
>>> disabled during reset in the crash kernel.
>>>
>>> Signed-off-by: Sameer Goel <sgoel@codeaurora.org>
>>> ---
>>>  drivers/iommu/arm-smmu-v3.c | 12 ++++++++++++
>>>  1 file changed, 12 insertions(+)
>>>
>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>> index 3f2f1fc68b52..c04a89310c59 100644
>>> --- a/drivers/iommu/arm-smmu-v3.c
>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>> @@ -2458,6 +2458,18 @@ static int arm_smmu_device_reset(struct
>>> arm_smmu_device *smmu, bool bypass)
>>>      if (reg & CR0_SMMUEN)
>>>          dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
>>>
>>> +    /*
>>> +     * Abort all incoming translations. This can happen in a kdump case
>>> +     * where SMMU is initialized when a prior DMA is pending. Just
>>> +     * disabling the SMMU in this case might result in writes to invalid
>>> +     * PAs.
>>> +     */
>>> +    ret = arm_smmu_update_gbpa(smmu, 1, GBPA_ABORT);
>>> +    if (ret) {
>>> +        dev_err(smmu->dev, "GBPA not responding to update\n");
>>> +        return ret;
>>> +    }
>>> +
>>>      ret = arm_smmu_device_disable(smmu);
>>>      if (ret)
>>>          return ret;
>>
>> A tangential question: can we reliably detect that the SMMU already
>> has valid mappings, which would indicate that we're in a pretty bad
>> shape already by the time we set that bit? For all we know, memory
>> could have been corrupted long before we hit this point, and this
>> patch barely narrows the window of opportunity.
>
> :) Yes that is correct. This only covers the kdump scenario. Trying
> to get some reliability when booting up the crash kernel. The system
> is already in a bad state. I don't think that this will happen in a
> normal scenario. But please point me to the GICv3 change and I'll
> have a look.

See this:
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/tree/drivers/irqchip/irq-gic-v3-its.c?h=irq/irqchip-4.17&id=6eb486b66a3094cdcd68dc39c9df3a29d6a51dd5#n3407

	M.
Robin Murphy April 12, 2018, 10:17 a.m. UTC | #6
On 11/04/18 17:54, Marc Zyngier wrote:
> Hi Sammer,
> 
> On 11/04/18 16:58, Goel, Sameer wrote:
>>
>>
>> On 3/28/2018 9:00 AM, Marc Zyngier wrote:
>>> On 2018-03-28 15:39, Timur Tabi wrote:
>>>> From: Sameer Goel <sgoel@codeaurora.org>
>>>>
>>>> Set SMMU_GBPA to abort all incoming translations during the SMMU reset
>>>> when SMMUEN==0.
>>>>
>>>> This prevents a race condition where a stray DMA from the crashed primary
>>>> kernel can try to access an IOVA address as an invalid PA when SMMU is
>>>> disabled during reset in the crash kernel.
>>>>
>>>> Signed-off-by: Sameer Goel <sgoel@codeaurora.org>
>>>> ---
>>>>   drivers/iommu/arm-smmu-v3.c | 12 ++++++++++++
>>>>   1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>> index 3f2f1fc68b52..c04a89310c59 100644
>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>> @@ -2458,6 +2458,18 @@ static int arm_smmu_device_reset(struct
>>>> arm_smmu_device *smmu, bool bypass)
>>>>       if (reg & CR0_SMMUEN)
>>>>           dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
>>>>
>>>> +    /*
>>>> +     * Abort all incoming translations. This can happen in a kdump case
>>>> +     * where SMMU is initialized when a prior DMA is pending. Just
>>>> +     * disabling the SMMU in this case might result in writes to invalid
>>>> +     * PAs.
>>>> +     */
>>>> +    ret = arm_smmu_update_gbpa(smmu, 1, GBPA_ABORT);
>>>> +    if (ret) {
>>>> +        dev_err(smmu->dev, "GBPA not responding to update\n");
>>>> +        return ret;
>>>> +    }
>>>> +
>>>>       ret = arm_smmu_device_disable(smmu);
>>>>       if (ret)
>>>>           return ret;
>>>
>>> A tangential question: can we reliably detect that the SMMU already
>>> has valid mappings, which would indicate that we're in a pretty bad
>>> shape already by the time we set that bit? For all we know, memory
>>> could have been corrupted long before we hit this point, and this
>>> patch barely narrows the window of opportunity.
>>
>> :) Yes that is correct. This only covers the kdump scenario. Trying
>> to get some reliability when booting up the crash kernel. The system
>> is already in a bad state. I don't think that this will happen in a
>> normal scenario. But please point me to the GICv3 change and I'll
>> have a look.
> 
> See this:
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/tree/drivers/irqchip/irq-gic-v3-its.c?h=irq/irqchip-4.17&id=6eb486b66a3094cdcd68dc39c9df3a29d6a51dd5#n3407

The nearest equivalent to that is probably the top-level SMMUEN check 
that we already have (see the diff context above). To go beyond that 
you'd have to chase the old stream table pointer and scan the whole 
thing looking for valid contexts, then potentially walk page tables 
within those contexts to check for live translations if you really 
wanted to be sure. That would be a hell of a lot of work to do in the 
boot path.

Robin.
Will Deacon April 12, 2018, 10:55 a.m. UTC | #7
On Thu, Apr 12, 2018 at 11:17:24AM +0100, Robin Murphy wrote:
> On 11/04/18 17:54, Marc Zyngier wrote:
> >On 11/04/18 16:58, Goel, Sameer wrote:
> >>On 3/28/2018 9:00 AM, Marc Zyngier wrote:
> >>>A tangential question: can we reliably detect that the SMMU already
> >>>has valid mappings, which would indicate that we're in a pretty bad
> >>>shape already by the time we set that bit? For all we know, memory
> >>>could have been corrupted long before we hit this point, and this
> >>>patch barely narrows the window of opportunity.
> >>
> >>:) Yes that is correct. This only covers the kdump scenario. Trying
> >>to get some reliability when booting up the crash kernel. The system
> >>is already in a bad state. I don't think that this will happen in a
> >>normal scenario. But please point me to the GICv3 change and I'll
> >>have a look.
> >
> >See this:
> >https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/tree/drivers/irqchip/irq-gic-v3-its.c?h=irq/irqchip-4.17&id=6eb486b66a3094cdcd68dc39c9df3a29d6a51dd5#n3407
> 
> The nearest equivalent to that is probably the top-level SMMUEN check that
> we already have (see the diff context above). To go beyond that you'd have
> to chase the old stream table pointer and scan the whole thing looking for
> valid contexts, then potentially walk page tables within those contexts to
> check for live translations if you really wanted to be sure. That would be a
> hell of a lot of work to do in the boot path.

Yeah, please don't waste time writing a patch to do that! ;)

Will
Marc Zyngier April 12, 2018, 11:56 a.m. UTC | #8
On 12/04/18 11:17, Robin Murphy wrote:
> On 11/04/18 17:54, Marc Zyngier wrote:
>> Hi Sammer,
>>
>> On 11/04/18 16:58, Goel, Sameer wrote:
>>>
>>>
>>> On 3/28/2018 9:00 AM, Marc Zyngier wrote:
>>>> On 2018-03-28 15:39, Timur Tabi wrote:
>>>>> From: Sameer Goel <sgoel@codeaurora.org>
>>>>>
>>>>> Set SMMU_GBPA to abort all incoming translations during the SMMU reset
>>>>> when SMMUEN==0.
>>>>>
>>>>> This prevents a race condition where a stray DMA from the crashed primary
>>>>> kernel can try to access an IOVA address as an invalid PA when SMMU is
>>>>> disabled during reset in the crash kernel.
>>>>>
>>>>> Signed-off-by: Sameer Goel <sgoel@codeaurora.org>
>>>>> ---
>>>>>   drivers/iommu/arm-smmu-v3.c | 12 ++++++++++++
>>>>>   1 file changed, 12 insertions(+)
>>>>>
>>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>>> index 3f2f1fc68b52..c04a89310c59 100644
>>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>>> @@ -2458,6 +2458,18 @@ static int arm_smmu_device_reset(struct
>>>>> arm_smmu_device *smmu, bool bypass)
>>>>>       if (reg & CR0_SMMUEN)
>>>>>           dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
>>>>>
>>>>> +    /*
>>>>> +     * Abort all incoming translations. This can happen in a kdump case
>>>>> +     * where SMMU is initialized when a prior DMA is pending. Just
>>>>> +     * disabling the SMMU in this case might result in writes to invalid
>>>>> +     * PAs.
>>>>> +     */
>>>>> +    ret = arm_smmu_update_gbpa(smmu, 1, GBPA_ABORT);
>>>>> +    if (ret) {
>>>>> +        dev_err(smmu->dev, "GBPA not responding to update\n");
>>>>> +        return ret;
>>>>> +    }
>>>>> +
>>>>>       ret = arm_smmu_device_disable(smmu);
>>>>>       if (ret)
>>>>>           return ret;
>>>>
>>>> A tangential question: can we reliably detect that the SMMU already
>>>> has valid mappings, which would indicate that we're in a pretty bad
>>>> shape already by the time we set that bit? For all we know, memory
>>>> could have been corrupted long before we hit this point, and this
>>>> patch barely narrows the window of opportunity.
>>>
>>> :) Yes that is correct. This only covers the kdump scenario. Trying
>>> to get some reliability when booting up the crash kernel. The system
>>> is already in a bad state. I don't think that this will happen in a
>>> normal scenario. But please point me to the GICv3 change and I'll
>>> have a look.
>>
>> See this:
>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/tree/drivers/irqchip/irq-gic-v3-its.c?h=irq/irqchip-4.17&id=6eb486b66a3094cdcd68dc39c9df3a29d6a51dd5#n3407
> 
> The nearest equivalent to that is probably the top-level SMMUEN check 
> that we already have (see the diff context above). To go beyond that 
> you'd have to chase the old stream table pointer and scan the whole 
> thing looking for valid contexts, then potentially walk page tables 
> within those contexts to check for live translations if you really 
> wanted to be sure. That would be a hell of a lot of work to do in the 
> boot path.
Yeah, feels a bit too involved for sanity. I'd simply suggest you taint
the kernel if you find the SMMU enabled, as you're already on shaky ground.

Thanks,

	M.
Goel, Sameer May 11, 2018, 4:15 p.m. UTC | #9
On 4/12/2018 5:56 AM, Marc Zyngier wrote:
> On 12/04/18 11:17, Robin Murphy wrote:
>> On 11/04/18 17:54, Marc Zyngier wrote:
>>> Hi Sammer,
>>>
>>> On 11/04/18 16:58, Goel, Sameer wrote:
>>>>
>>>>
>>>> On 3/28/2018 9:00 AM, Marc Zyngier wrote:
>>>>> On 2018-03-28 15:39, Timur Tabi wrote:
>>>>>> From: Sameer Goel <sgoel@codeaurora.org>
>>>>>>
>>>>>> Set SMMU_GBPA to abort all incoming translations during the SMMU reset
>>>>>> when SMMUEN==0.
>>>>>>
>>>>>> This prevents a race condition where a stray DMA from the crashed primary
>>>>>> kernel can try to access an IOVA address as an invalid PA when SMMU is
>>>>>> disabled during reset in the crash kernel.
>>>>>>
>>>>>> Signed-off-by: Sameer Goel <sgoel@codeaurora.org>
>>>>>> ---
>>>>>>   drivers/iommu/arm-smmu-v3.c | 12 ++++++++++++
>>>>>>   1 file changed, 12 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>>>> index 3f2f1fc68b52..c04a89310c59 100644
>>>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>>>> @@ -2458,6 +2458,18 @@ static int arm_smmu_device_reset(struct
>>>>>> arm_smmu_device *smmu, bool bypass)
>>>>>>       if (reg & CR0_SMMUEN)
>>>>>>           dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
>>>>>>
>>>>>> +    /*
>>>>>> +     * Abort all incoming translations. This can happen in a kdump case
>>>>>> +     * where SMMU is initialized when a prior DMA is pending. Just
>>>>>> +     * disabling the SMMU in this case might result in writes to invalid
>>>>>> +     * PAs.
>>>>>> +     */
>>>>>> +    ret = arm_smmu_update_gbpa(smmu, 1, GBPA_ABORT);
>>>>>> +    if (ret) {
>>>>>> +        dev_err(smmu->dev, "GBPA not responding to update\n");
>>>>>> +        return ret;
>>>>>> +    }
>>>>>> +
>>>>>>       ret = arm_smmu_device_disable(smmu);
>>>>>>       if (ret)
>>>>>>           return ret;
>>>>>
>>>>> A tangential question: can we reliably detect that the SMMU already
>>>>> has valid mappings, which would indicate that we're in a pretty bad
>>>>> shape already by the time we set that bit? For all we know, memory
>>>>> could have been corrupted long before we hit this point, and this
>>>>> patch barely narrows the window of opportunity.
>>>>
>>>> :) Yes that is correct. This only covers the kdump scenario. Trying
>>>> to get some reliability when booting up the crash kernel. The system
>>>> is already in a bad state. I don't think that this will happen in a
>>>> normal scenario. But please point me to the GICv3 change and I'll
>>>> have a look.
>>>
>>> See this:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/tree/drivers/irqchip/irq-gic-v3-its.c?h=irq/irqchip-4.17&id=6eb486b66a3094cdcd68dc39c9df3a29d6a51dd5#n3407
>>
>> The nearest equivalent to that is probably the top-level SMMUEN check 
>> that we already have (see the diff context above). To go beyond that 
>> you'd have to chase the old stream table pointer and scan the whole 
>> thing looking for valid contexts, then potentially walk page tables 
>> within those contexts to check for live translations if you really 
>> wanted to be sure. That would be a hell of a lot of work to do in the 
>> boot path.
> Yeah, feels a bit too involved for sanity. I'd simply suggest you taint
> the kernel if you find the SMMU enabled, as you're already on shaky ground.

Ok. I think since this is a kdump kernel a taint is not necessary?
> 
> Thanks,
> 
> 	M.
>
Nate Watterson May 11, 2018, 8:52 p.m. UTC | #10
Hi Mark,

On 4/12/2018 7:56 AM, Marc Zyngier wrote:
> On 12/04/18 11:17, Robin Murphy wrote:
>> On 11/04/18 17:54, Marc Zyngier wrote:
>>> Hi Sammer,
>>>
>>> On 11/04/18 16:58, Goel, Sameer wrote:
>>>>
>>>>
>>>> On 3/28/2018 9:00 AM, Marc Zyngier wrote:
>>>>> On 2018-03-28 15:39, Timur Tabi wrote:
>>>>>> From: Sameer Goel <sgoel@codeaurora.org>
>>>>>>
>>>>>> Set SMMU_GBPA to abort all incoming translations during the SMMU reset
>>>>>> when SMMUEN==0.
>>>>>>
>>>>>> This prevents a race condition where a stray DMA from the crashed primary
>>>>>> kernel can try to access an IOVA address as an invalid PA when SMMU is
>>>>>> disabled during reset in the crash kernel.
>>>>>>
>>>>>> Signed-off-by: Sameer Goel <sgoel@codeaurora.org>
>>>>>> ---
>>>>>>    drivers/iommu/arm-smmu-v3.c | 12 ++++++++++++
>>>>>>    1 file changed, 12 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>>>>>> index 3f2f1fc68b52..c04a89310c59 100644
>>>>>> --- a/drivers/iommu/arm-smmu-v3.c
>>>>>> +++ b/drivers/iommu/arm-smmu-v3.c
>>>>>> @@ -2458,6 +2458,18 @@ static int arm_smmu_device_reset(struct
>>>>>> arm_smmu_device *smmu, bool bypass)
>>>>>>        if (reg & CR0_SMMUEN)
>>>>>>            dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
>>>>>>
>>>>>> +    /*
>>>>>> +     * Abort all incoming translations. This can happen in a kdump case
>>>>>> +     * where SMMU is initialized when a prior DMA is pending. Just
>>>>>> +     * disabling the SMMU in this case might result in writes to invalid
>>>>>> +     * PAs.
>>>>>> +     */
>>>>>> +    ret = arm_smmu_update_gbpa(smmu, 1, GBPA_ABORT);
>>>>>> +    if (ret) {
>>>>>> +        dev_err(smmu->dev, "GBPA not responding to update\n");
>>>>>> +        return ret;
>>>>>> +    }
>>>>>> +
>>>>>>        ret = arm_smmu_device_disable(smmu);
>>>>>>        if (ret)
>>>>>>            return ret;
>>>>>
>>>>> A tangential question: can we reliably detect that the SMMU already
>>>>> has valid mappings, which would indicate that we're in a pretty bad
>>>>> shape already by the time we set that bit? For all we know, memory
>>>>> could have been corrupted long before we hit this point, and this
>>>>> patch barely narrows the window of opportunity.
>>>>
>>>> :) Yes that is correct. This only covers the kdump scenario. Trying
>>>> to get some reliability when booting up the crash kernel. The system
>>>> is already in a bad state. I don't think that this will happen in a
>>>> normal scenario. But please point me to the GICv3 change and I'll
>>>> have a look.
>>>
>>> See this:
>>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/tree/drivers/irqchip/irq-gic-v3-its.c?h=irq/irqchip-4.17&id=6eb486b66a3094cdcd68dc39c9df3a29d6a51dd5#n3407
>>
>> The nearest equivalent to that is probably the top-level SMMUEN check
>> that we already have (see the diff context above). To go beyond that
>> you'd have to chase the old stream table pointer and scan the whole
>> thing looking for valid contexts, then potentially walk page tables
>> within those contexts to check for live translations if you really
>> wanted to be sure. That would be a hell of a lot of work to do in the
>> boot path.
> Yeah, feels a bit too involved for sanity. I'd simply suggest you taint
> the kernel if you find the SMMU enabled, as you're already on shaky ground.

Finding the SMMU already enabled does not necessarily indicate that
anything catastrophic has occurred.

For instance, to support OSes without an SMMUv3 driver, boot FW may have
enabled the SMMU and installed 1-to-1 mappings for DDR and MSI target
addr(s) to compensate for a MSI-capable master whose default DMA attrs
needed tweaking (ex: non-coherent -> coherent).

If such a configuration warrants tainting the kernel, then we should
similarly check GBPA for attr overrides and taint the kernel if any are
found there.

> 
> Thanks,
> 
> 	M.
>
diff mbox

Patch

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 3f2f1fc68b52..c04a89310c59 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2458,6 +2458,18 @@  static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
 	if (reg & CR0_SMMUEN)
 		dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n");
 
+	/*
+	 * Abort all incoming translations. This can happen in a kdump case
+	 * where SMMU is initialized when a prior DMA is pending. Just
+	 * disabling the SMMU in this case might result in writes to invalid
+	 * PAs.
+	 */
+	ret = arm_smmu_update_gbpa(smmu, 1, GBPA_ABORT);
+	if (ret) {
+		dev_err(smmu->dev, "GBPA not responding to update\n");
+		return ret;
+	}
+
 	ret = arm_smmu_device_disable(smmu);
 	if (ret)
 		return ret;