[V4,0/3] PCI: designware-ep: Fix DBI access before core init

Message ID	20220919183342.4090-1-vidyas@nvidia.com (mailing list archive)
Headers	show Return-Path: <linux-pci-owner@kernel.org> Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 216.228.117.160 as permitted sender) receiver=protection.outlook.com; client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C From: Vidya Sagar <vidyas@nvidia.com> To: <jingoohan1@gmail.com>, <gustavo.pimentel@synopsys.com>, <lpieralisi@kernel.org>, <robh@kernel.org>, <kw@linux.com>, <bhelgaas@google.com>, <mani@kernel.org>, <Sergey.Semin@baikalelectronics.ru>, <dmitry.baryshkov@linaro.org>, <linmq006@gmail.com>, <ffclaire1224@gmail.com> CC: <thierry.reding@gmail.com>, <jonathanh@nvidia.com>, <linux-pci@vger.kernel.org>, <linux-arm-msm@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <kthota@nvidia.com>, <mmaddireddy@nvidia.com>, <vidyas@nvidia.com>, <sagar.tv@gmail.com> Subject: [PATCH V4 0/3] PCI: designware-ep: Fix DBI access before core init Date: Tue, 20 Sep 2022 00:03:39 +0530 Message-ID: <20220919183342.4090-1-vidyas@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk
Series	PCI: designware-ep: Fix DBI access before core init \| expand [V4,0/3] PCI: designware-ep: Fix DBI access before core init [V4,1/3] PCI: designware-ep: Fix DBI access before core init [V4,2/3] PCI: qcom-ep: Refactor EP initialization completion [V4,3/3] PCI: tegra194: Refactor EP initialization completion

Message ID

20220919183342.4090-1-vidyas@nvidia.com (mailing list archive)

Headers

Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates
 216.228.117.160 as permitted sender) receiver=protection.outlook.com;
 client-ip=216.228.117.160; helo=mail.nvidia.com; pr=C
From: Vidya Sagar <vidyas@nvidia.com>
To: <jingoohan1@gmail.com>, <gustavo.pimentel@synopsys.com>,
        <lpieralisi@kernel.org>, <robh@kernel.org>, <kw@linux.com>,
        <bhelgaas@google.com>, <mani@kernel.org>,
        <Sergey.Semin@baikalelectronics.ru>, <dmitry.baryshkov@linaro.org>,
        <linmq006@gmail.com>, <ffclaire1224@gmail.com>
CC: <thierry.reding@gmail.com>, <jonathanh@nvidia.com>,
        <linux-pci@vger.kernel.org>, <linux-arm-msm@vger.kernel.org>,
        <linux-kernel@vger.kernel.org>, <kthota@nvidia.com>,
        <mmaddireddy@nvidia.com>, <vidyas@nvidia.com>, <sagar.tv@gmail.com>
Subject: [PATCH V4 0/3] PCI: designware-ep: Fix DBI access before core init
Date: Tue, 20 Sep 2022 00:03:39 +0530
Message-ID: <20220919183342.4090-1-vidyas@nvidia.com>
MIME-Version: 1.0
Content-Type: text/plain
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Sep 2022 18:34:03.2715
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 5cae5e07-9b1e-411c-a53b-08da9a6d870d
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.160];Helo=[mail.nvidia.com]
X-MS-Exchange-CrossTenant-AuthSource: 
 DM6NAM11FT060.eop-nam11.prod.protection.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB4210
Precedence: bulk
List-ID: <linux-pci.vger.kernel.org>
X-Mailing-List: linux-pci@vger.kernel.org

Series

PCI: designware-ep: Fix DBI access before core init | expand

Message

Vidya Sagar Sept. 19, 2022, 6:33 p.m. UTC

This series attempts to fix the issue with core register (Ex:- DBI) accesses
causing system hang issues in platforms where there is a dependency on the
availability of PCIe Reference clock from the host for their core
initialization.
This series is verified on Tegra194 & Tegra234 platforms.

Manivannan, could you please verify on qcom platforms?

V4:
* Addressed review comments from Bjorn and Manivannan
* Added .ep_init_late() ops
* Added patches to refactor code in qcom and tegra platforms

Vidya Sagar (3):
  PCI: designware-ep: Fix DBI access before core init
  PCI: qcom-ep: Refactor EP initialization completion
  PCI: tegra194: Refactor EP initialization completion

 .../pci/controller/dwc/pcie-designware-ep.c   | 112 ++++++++++--------
 drivers/pci/controller/dwc/pcie-designware.h  |  10 +-
 drivers/pci/controller/dwc/pcie-qcom-ep.c     |  27 +++--
 drivers/pci/controller/dwc/pcie-tegra194.c    |   4 +-
 4 files changed, 85 insertions(+), 68 deletions(-)

Comments

Bjorn Helgaas Sept. 19, 2022, 10:40 p.m. UTC | #1

On Tue, Sep 20, 2022 at 12:03:39AM +0530, Vidya Sagar wrote:
> This series attempts to fix the issue with core register (Ex:- DBI) accesses
> causing system hang issues in platforms where there is a dependency on the
> availability of PCIe Reference clock from the host for their core
> initialization.
> This series is verified on Tegra194 & Tegra234 platforms.

I think this design is just kind of weird, specifically, the fact that
setting .core_init_notifier makes dw_pcie_ep_init() bail out early.
The usual pattern is more like "if the specific driver sets this
function pointer, the generic code calls it."

The name "dw_pcie_ep_init_complete()" is not as helpful as it could
be: it tells us something about what has happened before this point,
but it doesn't tell us anything about what dw_pcie_ep_init_complete()
*does*.

Same thing with dw_pcie_ep_init_notify() -- it doesn't tell us
anything about what the function *does*.  I see that it calls
pci_epc_init_notify(), which calls a notifier call chain (currently
always empty except for a test case).  I think pci_epc_linkup() is a
better name because it says something about what's happening: the link
is now up and we're telling somebody about it.  "pci_epc_init_notify()"
doesn't convey that.  "pci_epc_core_initialized()" might.

It looks like both qcom and tegra wait for an interrupt before calling
dw_pcie_ep_init_notify(), but I'm a little concerned because I can't
figure out what specifically they do to start the process that
ultimately generates the interrupt.  Presumably they request the IRQ
*before* starting the process, but there's not much between the
devm_request_threaded_irq() and the interrupt handler, which makes me
wonder if both are racy.

> Manivannan, could you please verify on qcom platforms?
> 
> V4:
> * Addressed review comments from Bjorn and Manivannan
> * Added .ep_init_late() ops
> * Added patches to refactor code in qcom and tegra platforms
> 
> Vidya Sagar (3):
>   PCI: designware-ep: Fix DBI access before core init
>   PCI: qcom-ep: Refactor EP initialization completion
>   PCI: tegra194: Refactor EP initialization completion
> 
>  .../pci/controller/dwc/pcie-designware-ep.c   | 112 ++++++++++--------
>  drivers/pci/controller/dwc/pcie-designware.h  |  10 +-
>  drivers/pci/controller/dwc/pcie-qcom-ep.c     |  27 +++--
>  drivers/pci/controller/dwc/pcie-tegra194.c    |   4 +-
>  4 files changed, 85 insertions(+), 68 deletions(-)
> 
> -- 
> 2.17.1
>

Vidya Sagar Sept. 26, 2022, 3:02 p.m. UTC | #2

On 9/20/2022 4:10 AM, Bjorn Helgaas wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Tue, Sep 20, 2022 at 12:03:39AM +0530, Vidya Sagar wrote:
>> This series attempts to fix the issue with core register (Ex:- DBI) accesses
>> causing system hang issues in platforms where there is a dependency on the
>> availability of PCIe Reference clock from the host for their core
>> initialization.
>> This series is verified on Tegra194 & Tegra234 platforms.
> 
> I think this design is just kind of weird, specifically, the fact that
> setting .core_init_notifier makes dw_pcie_ep_init() bail out early.
> The usual pattern is more like "if the specific driver sets this
> function pointer, the generic code calls it."

Thanks for the review Bjorn.

Typically the PCIe endpoints run using the reference clock from the 
hosts that they are connected to. Our hardware designers followed the 
same approach here as well, but the main difference here being that the 
controllers operating in the endpoint mode are not standalone 
controllers but part of a bigger Tegra (/Qcom) systems.
So, the complete controller initialization sequence just can't happen 
during the boot stage itself, hence the boot initialization sequence 
needs to be split into two parts viz a) early initialization - that just 
parses DT, does the programming that doesn't depend on the reference 
clock from host and b) does the programming that can only be performed 
after reference clock is available from the host
We are working with our hardware designers to avoid this dependency on 
the reference clock from the host so that all the programming can happen 
during boot itself and hardware is smart enough to switch to using the 
reference clock from the host when it is available. But, this is for 
future designs and Tegra194 & Tegra234 continue to have this limitation.

> 
> The name "dw_pcie_ep_init_complete()" is not as helpful as it could
> be: it tells us something about what has happened before this point,
> but it doesn't tell us anything about what dw_pcie_ep_init_complete()
> *does*.

To be inline with new ops ep_init_late that I added in this series, 
would it be fine to name this as dw_pcie_ep_init_late()?

> 
> Same thing with dw_pcie_ep_init_notify() -- it doesn't tell us
> anything about what the function *does*.

Would it make more sense to rename it as dw_pcie_ep_linkup_notify()?

   I see that it calls
> pci_epc_init_notify(), which calls a notifier call chain (currently
> always empty except for a test case).  I think pci_epc_linkup() is a
> better name because it says something about what's happening: the link
> is now up and we're telling somebody about it.  "pci_epc_init_notify()"
> doesn't convey that.  "pci_epc_core_initialized()" might.

Ok. I'll rename it to pci_epc_core_initialized().

> 
> It looks like both qcom and tegra wait for an interrupt before calling
> dw_pcie_ep_init_notify(), but I'm a little concerned because I can't
> figure out what specifically they do to start the process that
> ultimately generates the interrupt.

As part of 'start'ing the endpoint as mentioned in 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/PCI/endpoint/pci-test-howto.rst#n101
we execute the following
echo 1 > controllers/141a0000.pcie-ep/start
that enables the interrupt generation for toggles on the PERST# line.

   Presumably they request the IRQ
> *before* starting the process, but there's not much between the
> devm_request_threaded_irq() and the interrupt handler, which makes me
> wonder if both are racy.

I don't think there is any race between these two as the 'start' is 
initiated from the user space. Not sure if I'm missing something here 
though.

> 
>> Manivannan, could you please verify on qcom platforms?
>>
>> V4:
>> * Addressed review comments from Bjorn and Manivannan
>> * Added .ep_init_late() ops
>> * Added patches to refactor code in qcom and tegra platforms
>>
>> Vidya Sagar (3):
>>    PCI: designware-ep: Fix DBI access before core init
>>    PCI: qcom-ep: Refactor EP initialization completion
>>    PCI: tegra194: Refactor EP initialization completion
>>
>>   .../pci/controller/dwc/pcie-designware-ep.c   | 112 ++++++++++--------
>>   drivers/pci/controller/dwc/pcie-designware.h  |  10 +-
>>   drivers/pci/controller/dwc/pcie-qcom-ep.c     |  27 +++--
>>   drivers/pci/controller/dwc/pcie-tegra194.c    |   4 +-
>>   4 files changed, 85 insertions(+), 68 deletions(-)
>>
>> --
>> 2.17.1
>>

Vidya Sagar Oct. 3, 2022, 11:18 a.m. UTC | #3

Hi Bjorn,
Did you find time to take a look at my responses?
If you don't have anything to add further, I'll take care of the review 
comments as mentioned and send the V5 patch for review.
Please let me know.

Thanks,
Vidya Sagar

On 9/26/2022 8:32 PM, Vidya Sagar wrote:
> 
> 
> On 9/20/2022 4:10 AM, Bjorn Helgaas wrote:
>> External email: Use caution opening links or attachments
>>
>>
>> On Tue, Sep 20, 2022 at 12:03:39AM +0530, Vidya Sagar wrote:
>>> This series attempts to fix the issue with core register (Ex:- DBI) 
>>> accesses
>>> causing system hang issues in platforms where there is a dependency 
>>> on the
>>> availability of PCIe Reference clock from the host for their core
>>> initialization.
>>> This series is verified on Tegra194 & Tegra234 platforms.
>>
>> I think this design is just kind of weird, specifically, the fact that
>> setting .core_init_notifier makes dw_pcie_ep_init() bail out early.
>> The usual pattern is more like "if the specific driver sets this
>> function pointer, the generic code calls it."
> 
> Thanks for the review Bjorn.
> 
> Typically the PCIe endpoints run using the reference clock from the 
> hosts that they are connected to. Our hardware designers followed the 
> same approach here as well, but the main difference here being that the 
> controllers operating in the endpoint mode are not standalone 
> controllers but part of a bigger Tegra (/Qcom) systems.
> So, the complete controller initialization sequence just can't happen 
> during the boot stage itself, hence the boot initialization sequence 
> needs to be split into two parts viz a) early initialization - that just 
> parses DT, does the programming that doesn't depend on the reference 
> clock from host and b) does the programming that can only be performed 
> after reference clock is available from the host
> We are working with our hardware designers to avoid this dependency on 
> the reference clock from the host so that all the programming can happen 
> during boot itself and hardware is smart enough to switch to using the 
> reference clock from the host when it is available. But, this is for 
> future designs and Tegra194 & Tegra234 continue to have this limitation.
> 
>>
>> The name "dw_pcie_ep_init_complete()" is not as helpful as it could
>> be: it tells us something about what has happened before this point,
>> but it doesn't tell us anything about what dw_pcie_ep_init_complete()
>> *does*.
> 
> To be inline with new ops ep_init_late that I added in this series, 
> would it be fine to name this as dw_pcie_ep_init_late()?
> 
>>
>> Same thing with dw_pcie_ep_init_notify() -- it doesn't tell us
>> anything about what the function *does*.
> 
> Would it make more sense to rename it as dw_pcie_ep_linkup_notify()?
> 
>    I see that it calls
>> pci_epc_init_notify(), which calls a notifier call chain (currently
>> always empty except for a test case).  I think pci_epc_linkup() is a
>> better name because it says something about what's happening: the link
>> is now up and we're telling somebody about it.  "pci_epc_init_notify()"
>> doesn't convey that.  "pci_epc_core_initialized()" might.
> 
> Ok. I'll rename it to pci_epc_core_initialized().
> 
>>
>> It looks like both qcom and tegra wait for an interrupt before calling
>> dw_pcie_ep_init_notify(), but I'm a little concerned because I can't
>> figure out what specifically they do to start the process that
>> ultimately generates the interrupt.
> 
> As part of 'start'ing the endpoint as mentioned in 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/PCI/endpoint/pci-test-howto.rst#n101 
> 
> we execute the following
> echo 1 > controllers/141a0000.pcie-ep/start
> that enables the interrupt generation for toggles on the PERST# line.
> 
>    Presumably they request the IRQ
>> *before* starting the process, but there's not much between the
>> devm_request_threaded_irq() and the interrupt handler, which makes me
>> wonder if both are racy.
> 
> I don't think there is any race between these two as the 'start' is 
> initiated from the user space. Not sure if I'm missing something here 
> though.
> 
>>
>>> Manivannan, could you please verify on qcom platforms?
>>>
>>> V4:
>>> * Addressed review comments from Bjorn and Manivannan
>>> * Added .ep_init_late() ops
>>> * Added patches to refactor code in qcom and tegra platforms
>>>
>>> Vidya Sagar (3):
>>>    PCI: designware-ep: Fix DBI access before core init
>>>    PCI: qcom-ep: Refactor EP initialization completion
>>>    PCI: tegra194: Refactor EP initialization completion
>>>
>>>   .../pci/controller/dwc/pcie-designware-ep.c   | 112 ++++++++++--------
>>>   drivers/pci/controller/dwc/pcie-designware.h  |  10 +-
>>>   drivers/pci/controller/dwc/pcie-qcom-ep.c     |  27 +++--
>>>   drivers/pci/controller/dwc/pcie-tegra194.c    |   4 +-
>>>   4 files changed, 85 insertions(+), 68 deletions(-)
>>>
>>> -- 
>>> 2.17.1
>>>