diff mbox series

[v5,1/3] PCI: qcom: Enable cache coherency for SA8775P RC

Message ID 1708697021-16877-2-git-send-email-quic_msarkar@quicinc.com (mailing list archive)
State Superseded
Delegated to: Manivannan Sadhasivam
Headers show
Series arm64: qcom: sa8775p: add cache coherency support for SA8775P | expand

Commit Message

Mrinmay Sarkar Feb. 23, 2024, 2:03 p.m. UTC
Due to some hardware changes, SA8775P has set the NO_SNOOP attribute
in its TLP for all the PCIe controllers. NO_SNOOP attribute when set,
the requester is indicating that there no cache coherency issues exit
for the addressed memory on the host i.e., memory is not cached. But
in reality, requester cannot assume this unless there is a complete
control/visibility over the addressed memory on the host.

And worst case, if the memory is cached on the host, it may lead to
memory corruption issues. It should be noted that the caching of memory
on the host is not solely dependent on the NO_SNOOP attribute in TLP.

So to avoid the corruption, this patch overrides the NO_SNOOP attribute
by setting the PCIE_PARF_NO_SNOOP_OVERIDE register. This patch is not
needed for other upstream supported platforms since they do not set
NO_SNOOP attribute by default.

8775 has IP version 1.34.0 so intruduce a new cfg(cfg_1_34_0) for this
platform. Assign enable_cache_snoop flag into struct qcom_pcie_cfg and
set it true in cfg_1_34_0 and enable cache snooping if this particular
flag is true.

Signed-off-by: Mrinmay Sarkar <quic_msarkar@quicinc.com>
---
 drivers/pci/controller/dwc/pcie-qcom.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

Comments

Bjorn Helgaas Feb. 23, 2024, 10:54 p.m. UTC | #1
On Fri, Feb 23, 2024 at 07:33:38PM +0530, Mrinmay Sarkar wrote:
> Due to some hardware changes, SA8775P has set the NO_SNOOP attribute
> in its TLP for all the PCIe controllers. NO_SNOOP attribute when set,
> the requester is indicating that there no cache coherency issues exit
> for the addressed memory on the host i.e., memory is not cached. But
> in reality, requester cannot assume this unless there is a complete
> control/visibility over the addressed memory on the host.

s/that there no/that no/
s/exit/exist/

Forgive my ignorance here.  It sounds like the cache coherency issue
would refer to system memory, so the relevant No Snoop attribute would
be in DMA transactions, i.e., Memory Reads or Writes initiated by PCIe
Endpoints.  But it looks like this patch would affect TLPs initiated
by the Root Complex, not those from Endpoints, so I'm confused about 
how this works.

If this were in the qcom-ep driver, it would make sense that setting
No Snoop in the TLPs initiated by the Endpoint could be a problem, but
that doesn't seem to be what this patch is concerned with.

> And worst case, if the memory is cached on the host, it may lead to
> memory corruption issues. It should be noted that the caching of memory
> on the host is not solely dependent on the NO_SNOOP attribute in TLP.
> 
> So to avoid the corruption, this patch overrides the NO_SNOOP attribute
> by setting the PCIE_PARF_NO_SNOOP_OVERIDE register. This patch is not
> needed for other upstream supported platforms since they do not set
> NO_SNOOP attribute by default.
> 
> 8775 has IP version 1.34.0 so intruduce a new cfg(cfg_1_34_0) for this
> platform. Assign enable_cache_snoop flag into struct qcom_pcie_cfg and
> set it true in cfg_1_34_0 and enable cache snooping if this particular
> flag is true.

s/intruduce/introduce/

Bjorn
Mrinmay Sarkar Feb. 28, 2024, 1:04 p.m. UTC | #2
On 2/24/2024 4:24 AM, Bjorn Helgaas wrote:
> On Fri, Feb 23, 2024 at 07:33:38PM +0530, Mrinmay Sarkar wrote:
>> Due to some hardware changes, SA8775P has set the NO_SNOOP attribute
>> in its TLP for all the PCIe controllers. NO_SNOOP attribute when set,
>> the requester is indicating that there no cache coherency issues exit
>> for the addressed memory on the host i.e., memory is not cached. But
>> in reality, requester cannot assume this unless there is a complete
>> control/visibility over the addressed memory on the host.
> s/that there no/that no/
> s/exit/exist/
>
> Forgive my ignorance here.  It sounds like the cache coherency issue
> would refer to system memory, so the relevant No Snoop attribute would
> be in DMA transactions, i.e., Memory Reads or Writes initiated by PCIe
> Endpoints.  But it looks like this patch would affect TLPs initiated
> by the Root Complex, not those from Endpoints, so I'm confused about
> how this works.
>
> If this were in the qcom-ep driver, it would make sense that setting
> No Snoop in the TLPs initiated by the Endpoint could be a problem, but
> that doesn't seem to be what this patch is concerned with.
I think in multiprocessor system cache coherency issue might occur.
and RC as well needs to snoop cache to avoid coherency as it is not
enable by default.

and we are enabling this feature for qcom-ep driver as well.
it is in patch2.

Thanks
Mrinmay

>> And worst case, if the memory is cached on the host, it may lead to
>> memory corruption issues. It should be noted that the caching of memory
>> on the host is not solely dependent on the NO_SNOOP attribute in TLP.
>>
>> So to avoid the corruption, this patch overrides the NO_SNOOP attribute
>> by setting the PCIE_PARF_NO_SNOOP_OVERIDE register. This patch is not
>> needed for other upstream supported platforms since they do not set
>> NO_SNOOP attribute by default.
>>
>> 8775 has IP version 1.34.0 so intruduce a new cfg(cfg_1_34_0) for this
>> platform. Assign enable_cache_snoop flag into struct qcom_pcie_cfg and
>> set it true in cfg_1_34_0 and enable cache snooping if this particular
>> flag is true.
> s/intruduce/introduce/
>
> Bjorn
Bjorn Helgaas Feb. 28, 2024, 3:02 p.m. UTC | #3
On Wed, Feb 28, 2024 at 06:34:11PM +0530, Mrinmay Sarkar wrote:
> On 2/24/2024 4:24 AM, Bjorn Helgaas wrote:
> > On Fri, Feb 23, 2024 at 07:33:38PM +0530, Mrinmay Sarkar wrote:
> > > Due to some hardware changes, SA8775P has set the NO_SNOOP attribute
> > > in its TLP for all the PCIe controllers. NO_SNOOP attribute when set,
> > > the requester is indicating that there no cache coherency issues exit
> > > for the addressed memory on the host i.e., memory is not cached. But
> > > in reality, requester cannot assume this unless there is a complete
> > > control/visibility over the addressed memory on the host.
> > 
> > Forgive my ignorance here.  It sounds like the cache coherency issue
> > would refer to system memory, so the relevant No Snoop attribute would
> > be in DMA transactions, i.e., Memory Reads or Writes initiated by PCIe
> > Endpoints.  But it looks like this patch would affect TLPs initiated
> > by the Root Complex, not those from Endpoints, so I'm confused about
> > how this works.
> > 
> > If this were in the qcom-ep driver, it would make sense that setting
> > No Snoop in the TLPs initiated by the Endpoint could be a problem, but
> > that doesn't seem to be what this patch is concerned with.
>
> I think in multiprocessor system cache coherency issue might occur.
> and RC as well needs to snoop cache to avoid coherency as it is not
> enable by default.

My mental picture isn't detailed enough, so I'm still confused.  We're
talking about TLPs initiated by the RC.  Normally these would be
because a driver did a CPU load or store to a PCIe device MMIO space,
not to system memory.

But I guess you're suggesting the RC can initiate a TLP with a system
memory address?  And this TLP would be routed not to a Root Port or to
downstream devices, but it would instead be kind of a loopback and be
routed back up through the RC and maybe IOMMU, to system memory?

I would have expected accesses like this to be routed directly to
system memory without ever reaching the PCIe RC.

> and we are enabling this feature for qcom-ep driver as well.
> it is in patch2.
> 
> Thanks
> Mrinmay
> 
> > > And worst case, if the memory is cached on the host, it may lead to
> > > memory corruption issues. It should be noted that the caching of memory
> > > on the host is not solely dependent on the NO_SNOOP attribute in TLP.
> > > 
> > > So to avoid the corruption, this patch overrides the NO_SNOOP attribute
> > > by setting the PCIE_PARF_NO_SNOOP_OVERIDE register. This patch is not
> > > needed for other upstream supported platforms since they do not set
> > > NO_SNOOP attribute by default.
> > > 
> > > 8775 has IP version 1.34.0 so intruduce a new cfg(cfg_1_34_0) for this
> > > platform. Assign enable_cache_snoop flag into struct qcom_pcie_cfg and
> > > set it true in cfg_1_34_0 and enable cache snooping if this particular
> > > flag is true.
> > s/intruduce/introduce/
> > 
> > Bjorn
Manivannan Sadhasivam Feb. 28, 2024, 5:14 p.m. UTC | #4
On Wed, Feb 28, 2024 at 09:02:11AM -0600, Bjorn Helgaas wrote:
> On Wed, Feb 28, 2024 at 06:34:11PM +0530, Mrinmay Sarkar wrote:
> > On 2/24/2024 4:24 AM, Bjorn Helgaas wrote:
> > > On Fri, Feb 23, 2024 at 07:33:38PM +0530, Mrinmay Sarkar wrote:
> > > > Due to some hardware changes, SA8775P has set the NO_SNOOP attribute
> > > > in its TLP for all the PCIe controllers. NO_SNOOP attribute when set,
> > > > the requester is indicating that there no cache coherency issues exit
> > > > for the addressed memory on the host i.e., memory is not cached. But
> > > > in reality, requester cannot assume this unless there is a complete
> > > > control/visibility over the addressed memory on the host.
> > > 
> > > Forgive my ignorance here.  It sounds like the cache coherency issue
> > > would refer to system memory, so the relevant No Snoop attribute would
> > > be in DMA transactions, i.e., Memory Reads or Writes initiated by PCIe
> > > Endpoints.  But it looks like this patch would affect TLPs initiated
> > > by the Root Complex, not those from Endpoints, so I'm confused about
> > > how this works.
> > > 
> > > If this were in the qcom-ep driver, it would make sense that setting
> > > No Snoop in the TLPs initiated by the Endpoint could be a problem, but
> > > that doesn't seem to be what this patch is concerned with.
> >
> > I think in multiprocessor system cache coherency issue might occur.
> > and RC as well needs to snoop cache to avoid coherency as it is not
> > enable by default.
> 
> My mental picture isn't detailed enough, so I'm still confused.  We're
> talking about TLPs initiated by the RC.  Normally these would be
> because a driver did a CPU load or store to a PCIe device MMIO space,
> not to system memory.
> 

Endpoint can expose its system memory as a BAR to the host. In that case, the
cache coherency issue would apply for TLPs originating from RC as well.

- Mani

> But I guess you're suggesting the RC can initiate a TLP with a system
> memory address?  And this TLP would be routed not to a Root Port or to
> downstream devices, but it would instead be kind of a loopback and be
> routed back up through the RC and maybe IOMMU, to system memory?
> 
> I would have expected accesses like this to be routed directly to
> system memory without ever reaching the PCIe RC.
> 
> > and we are enabling this feature for qcom-ep driver as well.
> > it is in patch2.
> > 
> > Thanks
> > Mrinmay
> > 
> > > > And worst case, if the memory is cached on the host, it may lead to
> > > > memory corruption issues. It should be noted that the caching of memory
> > > > on the host is not solely dependent on the NO_SNOOP attribute in TLP.
> > > > 
> > > > So to avoid the corruption, this patch overrides the NO_SNOOP attribute
> > > > by setting the PCIE_PARF_NO_SNOOP_OVERIDE register. This patch is not
> > > > needed for other upstream supported platforms since they do not set
> > > > NO_SNOOP attribute by default.
> > > > 
> > > > 8775 has IP version 1.34.0 so intruduce a new cfg(cfg_1_34_0) for this
> > > > platform. Assign enable_cache_snoop flag into struct qcom_pcie_cfg and
> > > > set it true in cfg_1_34_0 and enable cache snooping if this particular
> > > > flag is true.
> > > s/intruduce/introduce/
> > > 
> > > Bjorn
Bjorn Helgaas Feb. 28, 2024, 5:39 p.m. UTC | #5
On Wed, Feb 28, 2024 at 10:44:12PM +0530, Manivannan Sadhasivam wrote:
> On Wed, Feb 28, 2024 at 09:02:11AM -0600, Bjorn Helgaas wrote:
> > On Wed, Feb 28, 2024 at 06:34:11PM +0530, Mrinmay Sarkar wrote:
> > > On 2/24/2024 4:24 AM, Bjorn Helgaas wrote:
> > > > On Fri, Feb 23, 2024 at 07:33:38PM +0530, Mrinmay Sarkar wrote:
> > > > > Due to some hardware changes, SA8775P has set the NO_SNOOP attribute
> > > > > in its TLP for all the PCIe controllers. NO_SNOOP attribute when set,
> > > > > the requester is indicating that there no cache coherency issues exit
> > > > > for the addressed memory on the host i.e., memory is not cached. But
> > > > > in reality, requester cannot assume this unless there is a complete
> > > > > control/visibility over the addressed memory on the host.
> > > > 
> > > > Forgive my ignorance here.  It sounds like the cache coherency issue
> > > > would refer to system memory, so the relevant No Snoop attribute would
> > > > be in DMA transactions, i.e., Memory Reads or Writes initiated by PCIe
> > > > Endpoints.  But it looks like this patch would affect TLPs initiated
> > > > by the Root Complex, not those from Endpoints, so I'm confused about
> > > > how this works.
> > > > 
> > > > If this were in the qcom-ep driver, it would make sense that setting
> > > > No Snoop in the TLPs initiated by the Endpoint could be a problem, but
> > > > that doesn't seem to be what this patch is concerned with.
> > >
> > > I think in multiprocessor system cache coherency issue might occur.
> > > and RC as well needs to snoop cache to avoid coherency as it is not
> > > enable by default.
> > 
> > My mental picture isn't detailed enough, so I'm still confused.  We're
> > talking about TLPs initiated by the RC.  Normally these would be
> > because a driver did a CPU load or store to a PCIe device MMIO space,
> > not to system memory.
> 
> Endpoint can expose its system memory as a BAR to the host. In that case, the
> cache coherency issue would apply for TLPs originating from RC as well.

What PCIe transactions are involved here?  So far I know about:

  RC initiates Memory Read Request (or Write) with NO_SNOOP==0
  ...
  EP responds with Completion with Data (for Read) 

But I guess you're saying the EP would initiate other transactions in
the middle related to snooping?  I don't know what those are.

> > But I guess you're suggesting the RC can initiate a TLP with a system
> > memory address?  And this TLP would be routed not to a Root Port or to
> > downstream devices, but it would instead be kind of a loopback and be
> > routed back up through the RC and maybe IOMMU, to system memory?
> > 
> > I would have expected accesses like this to be routed directly to
> > system memory without ever reaching the PCIe RC.
> > 
> > > and we are enabling this feature for qcom-ep driver as well.
> > > it is in patch2.
> > > 
> > > Thanks
> > > Mrinmay
> > > 
> > > > > And worst case, if the memory is cached on the host, it may lead to
> > > > > memory corruption issues. It should be noted that the caching of memory
> > > > > on the host is not solely dependent on the NO_SNOOP attribute in TLP.
> > > > > 
> > > > > So to avoid the corruption, this patch overrides the NO_SNOOP attribute
> > > > > by setting the PCIE_PARF_NO_SNOOP_OVERIDE register. This patch is not
> > > > > needed for other upstream supported platforms since they do not set
> > > > > NO_SNOOP attribute by default.
> > > > > 
> > > > > 8775 has IP version 1.34.0 so intruduce a new cfg(cfg_1_34_0) for this
> > > > > platform. Assign enable_cache_snoop flag into struct qcom_pcie_cfg and
> > > > > set it true in cfg_1_34_0 and enable cache snooping if this particular
> > > > > flag is true.
> > > > s/intruduce/introduce/
> > > > 
> > > > Bjorn
> 
> -- 
> மணிவண்ணன் சதாசிவம்
Manivannan Sadhasivam Feb. 28, 2024, 6:45 p.m. UTC | #6
On Wed, Feb 28, 2024 at 11:39:07AM -0600, Bjorn Helgaas wrote:
> On Wed, Feb 28, 2024 at 10:44:12PM +0530, Manivannan Sadhasivam wrote:
> > On Wed, Feb 28, 2024 at 09:02:11AM -0600, Bjorn Helgaas wrote:
> > > On Wed, Feb 28, 2024 at 06:34:11PM +0530, Mrinmay Sarkar wrote:
> > > > On 2/24/2024 4:24 AM, Bjorn Helgaas wrote:
> > > > > On Fri, Feb 23, 2024 at 07:33:38PM +0530, Mrinmay Sarkar wrote:
> > > > > > Due to some hardware changes, SA8775P has set the NO_SNOOP attribute
> > > > > > in its TLP for all the PCIe controllers. NO_SNOOP attribute when set,
> > > > > > the requester is indicating that there no cache coherency issues exit
> > > > > > for the addressed memory on the host i.e., memory is not cached. But
> > > > > > in reality, requester cannot assume this unless there is a complete
> > > > > > control/visibility over the addressed memory on the host.
> > > > > 
> > > > > Forgive my ignorance here.  It sounds like the cache coherency issue
> > > > > would refer to system memory, so the relevant No Snoop attribute would
> > > > > be in DMA transactions, i.e., Memory Reads or Writes initiated by PCIe
> > > > > Endpoints.  But it looks like this patch would affect TLPs initiated
> > > > > by the Root Complex, not those from Endpoints, so I'm confused about
> > > > > how this works.
> > > > > 
> > > > > If this were in the qcom-ep driver, it would make sense that setting
> > > > > No Snoop in the TLPs initiated by the Endpoint could be a problem, but
> > > > > that doesn't seem to be what this patch is concerned with.
> > > >
> > > > I think in multiprocessor system cache coherency issue might occur.
> > > > and RC as well needs to snoop cache to avoid coherency as it is not
> > > > enable by default.
> > > 
> > > My mental picture isn't detailed enough, so I'm still confused.  We're
> > > talking about TLPs initiated by the RC.  Normally these would be
> > > because a driver did a CPU load or store to a PCIe device MMIO space,
> > > not to system memory.
> > 
> > Endpoint can expose its system memory as a BAR to the host. In that case, the
> > cache coherency issue would apply for TLPs originating from RC as well.
> 
> What PCIe transactions are involved here?  So far I know about:
> 
>   RC initiates Memory Read Request (or Write) with NO_SNOOP==0
>   ...
>   EP responds with Completion with Data (for Read) 
> 

The memory on the endpoint may be cached (due to linear map and such). So if the
RC is initiating the MWd TLP with NO_SNOOP=1, then there would be coherency
issues because there is no guarantee that the memory is not cached on the
endpoint. So, not snooping the caches and directly writing to the DDR would
cause coherency issues on the endpoint as well.

- Mani

> But I guess you're saying the EP would initiate other transactions in
> the middle related to snooping?  I don't know what those are.
> 
> > > But I guess you're suggesting the RC can initiate a TLP with a system
> > > memory address?  And this TLP would be routed not to a Root Port or to
> > > downstream devices, but it would instead be kind of a loopback and be
> > > routed back up through the RC and maybe IOMMU, to system memory?
> > > 
> > > I would have expected accesses like this to be routed directly to
> > > system memory without ever reaching the PCIe RC.
> > > 
> > > > and we are enabling this feature for qcom-ep driver as well.
> > > > it is in patch2.
> > > > 
> > > > Thanks
> > > > Mrinmay
> > > > 
> > > > > > And worst case, if the memory is cached on the host, it may lead to
> > > > > > memory corruption issues. It should be noted that the caching of memory
> > > > > > on the host is not solely dependent on the NO_SNOOP attribute in TLP.
> > > > > > 
> > > > > > So to avoid the corruption, this patch overrides the NO_SNOOP attribute
> > > > > > by setting the PCIE_PARF_NO_SNOOP_OVERIDE register. This patch is not
> > > > > > needed for other upstream supported platforms since they do not set
> > > > > > NO_SNOOP attribute by default.
> > > > > > 
> > > > > > 8775 has IP version 1.34.0 so intruduce a new cfg(cfg_1_34_0) for this
> > > > > > platform. Assign enable_cache_snoop flag into struct qcom_pcie_cfg and
> > > > > > set it true in cfg_1_34_0 and enable cache snooping if this particular
> > > > > > flag is true.
> > > > > s/intruduce/introduce/
> > > > > 
> > > > > Bjorn
> > 
> > -- 
> > மணிவண்ணன் சதாசிவம்
Bjorn Helgaas Feb. 28, 2024, 7:34 p.m. UTC | #7
On Thu, Feb 29, 2024 at 12:15:02AM +0530, Manivannan Sadhasivam wrote:
> On Wed, Feb 28, 2024 at 11:39:07AM -0600, Bjorn Helgaas wrote:
> > On Wed, Feb 28, 2024 at 10:44:12PM +0530, Manivannan Sadhasivam wrote:
> > > On Wed, Feb 28, 2024 at 09:02:11AM -0600, Bjorn Helgaas wrote:
> > > > On Wed, Feb 28, 2024 at 06:34:11PM +0530, Mrinmay Sarkar wrote:
> > > > > On 2/24/2024 4:24 AM, Bjorn Helgaas wrote:
> > > > > > On Fri, Feb 23, 2024 at 07:33:38PM +0530, Mrinmay Sarkar wrote:
> > > > > > > Due to some hardware changes, SA8775P has set the
> > > > > > > NO_SNOOP attribute in its TLP for all the PCIe
> > > > > > > controllers. NO_SNOOP attribute when set, the requester
> > > > > > > is indicating that there no cache coherency issues exit
> > > > > > > for the addressed memory on the host i.e., memory is not
> > > > > > > cached. But in reality, requester cannot assume this
> > > > > > > unless there is a complete control/visibility over the
> > > > > > > addressed memory on the host.
> > > > > > 
> > > > > > Forgive my ignorance here.  It sounds like the cache
> > > > > > coherency issue would refer to system memory, so the
> > > > > > relevant No Snoop attribute would be in DMA transactions,
> > > > > > i.e., Memory Reads or Writes initiated by PCIe Endpoints.
> > > > > > But it looks like this patch would affect TLPs initiated
> > > > > > by the Root Complex, not those from Endpoints, so I'm
> > > > > > confused about how this works.
> > > > > > 
> > > > > > If this were in the qcom-ep driver, it would make sense
> > > > > > that setting No Snoop in the TLPs initiated by the
> > > > > > Endpoint could be a problem, but that doesn't seem to be
> > > > > > what this patch is concerned with.
> > > > >
> > > > > I think in multiprocessor system cache coherency issue might
> > > > > occur.  and RC as well needs to snoop cache to avoid
> > > > > coherency as it is not enable by default.
> > > > 
> > > > My mental picture isn't detailed enough, so I'm still
> > > > confused.  We're talking about TLPs initiated by the RC.
> > > > Normally these would be because a driver did a CPU load or
> > > > store to a PCIe device MMIO space, not to system memory.
> > > 
> > > Endpoint can expose its system memory as a BAR to the host. In
> > > that case, the cache coherency issue would apply for TLPs
> > > originating from RC as well.
> > 
> > What PCIe transactions are involved here?  So far I know about:
> > 
> >   RC initiates Memory Read Request (or Write) with NO_SNOOP==0
> >     ...
> >   EP responds with Completion with Data (for Read) 
> 
> The memory on the endpoint may be cached (due to linear map and
> such). So if the RC is initiating the MWd TLP with NO_SNOOP=1, then
> there would be coherency issues because there is no guarantee that
> the memory is not cached on the endpoint. So, not snooping the
> caches and directly writing to the DDR would cause coherency issues
> on the endpoint as well.

I don't know what linear map is, but I'll take your word for it that
endpoints are allowed to cache things internally.  So I guess in the
ideal world there might be a way for a driver to specify no-snoop for
accesses to its device if it knows there is no caching.

The commit log for this patch refers to caching on the *host*, though,
and IIUC you're saying this patch clears NO_SNOOP on TLPs from the RC
because of potential coherency issues on the *endpoint*.

> > But I guess you're saying the EP would initiate other transactions in
> > the middle related to snooping?  I don't know what those are.
> > 
> > > > But I guess you're suggesting the RC can initiate a TLP with a system
> > > > memory address?  And this TLP would be routed not to a Root Port or to
> > > > downstream devices, but it would instead be kind of a loopback and be
> > > > routed back up through the RC and maybe IOMMU, to system memory?
> > > > 
> > > > I would have expected accesses like this to be routed directly to
> > > > system memory without ever reaching the PCIe RC.
> > > > 
> > > > > and we are enabling this feature for qcom-ep driver as well.
> > > > > it is in patch2.
> > > > > 
> > > > > Thanks
> > > > > Mrinmay
> > > > > 
> > > > > > > And worst case, if the memory is cached on the host, it may lead to
> > > > > > > memory corruption issues. It should be noted that the caching of memory
> > > > > > > on the host is not solely dependent on the NO_SNOOP attribute in TLP.
> > > > > > > 
> > > > > > > So to avoid the corruption, this patch overrides the NO_SNOOP attribute
> > > > > > > by setting the PCIE_PARF_NO_SNOOP_OVERIDE register. This patch is not
> > > > > > > needed for other upstream supported platforms since they do not set
> > > > > > > NO_SNOOP attribute by default.
> > > > > > > 
> > > > > > > 8775 has IP version 1.34.0 so intruduce a new cfg(cfg_1_34_0) for this
> > > > > > > platform. Assign enable_cache_snoop flag into struct qcom_pcie_cfg and
> > > > > > > set it true in cfg_1_34_0 and enable cache snooping if this particular
> > > > > > > flag is true.
> > > > > > s/intruduce/introduce/
> > > > > > 
> > > > > > Bjorn
> > > 
> > > -- 
> > > மணிவண்ணன் சதாசிவம்
> 
> -- 
> மணிவண்ணன் சதாசிவம்
Manivannan Sadhasivam March 4, 2024, 6 a.m. UTC | #8
On Wed, Feb 28, 2024 at 01:34:41PM -0600, Bjorn Helgaas wrote:
> On Thu, Feb 29, 2024 at 12:15:02AM +0530, Manivannan Sadhasivam wrote:
> > On Wed, Feb 28, 2024 at 11:39:07AM -0600, Bjorn Helgaas wrote:
> > > On Wed, Feb 28, 2024 at 10:44:12PM +0530, Manivannan Sadhasivam wrote:
> > > > On Wed, Feb 28, 2024 at 09:02:11AM -0600, Bjorn Helgaas wrote:
> > > > > On Wed, Feb 28, 2024 at 06:34:11PM +0530, Mrinmay Sarkar wrote:
> > > > > > On 2/24/2024 4:24 AM, Bjorn Helgaas wrote:
> > > > > > > On Fri, Feb 23, 2024 at 07:33:38PM +0530, Mrinmay Sarkar wrote:
> > > > > > > > Due to some hardware changes, SA8775P has set the
> > > > > > > > NO_SNOOP attribute in its TLP for all the PCIe
> > > > > > > > controllers. NO_SNOOP attribute when set, the requester
> > > > > > > > is indicating that there no cache coherency issues exit
> > > > > > > > for the addressed memory on the host i.e., memory is not
> > > > > > > > cached. But in reality, requester cannot assume this
> > > > > > > > unless there is a complete control/visibility over the
> > > > > > > > addressed memory on the host.
> > > > > > > 
> > > > > > > Forgive my ignorance here.  It sounds like the cache
> > > > > > > coherency issue would refer to system memory, so the
> > > > > > > relevant No Snoop attribute would be in DMA transactions,
> > > > > > > i.e., Memory Reads or Writes initiated by PCIe Endpoints.
> > > > > > > But it looks like this patch would affect TLPs initiated
> > > > > > > by the Root Complex, not those from Endpoints, so I'm
> > > > > > > confused about how this works.
> > > > > > > 
> > > > > > > If this were in the qcom-ep driver, it would make sense
> > > > > > > that setting No Snoop in the TLPs initiated by the
> > > > > > > Endpoint could be a problem, but that doesn't seem to be
> > > > > > > what this patch is concerned with.
> > > > > >
> > > > > > I think in multiprocessor system cache coherency issue might
> > > > > > occur.  and RC as well needs to snoop cache to avoid
> > > > > > coherency as it is not enable by default.
> > > > > 
> > > > > My mental picture isn't detailed enough, so I'm still
> > > > > confused.  We're talking about TLPs initiated by the RC.
> > > > > Normally these would be because a driver did a CPU load or
> > > > > store to a PCIe device MMIO space, not to system memory.
> > > > 
> > > > Endpoint can expose its system memory as a BAR to the host. In
> > > > that case, the cache coherency issue would apply for TLPs
> > > > originating from RC as well.
> > > 
> > > What PCIe transactions are involved here?  So far I know about:
> > > 
> > >   RC initiates Memory Read Request (or Write) with NO_SNOOP==0
> > >     ...
> > >   EP responds with Completion with Data (for Read) 
> > 
> > The memory on the endpoint may be cached (due to linear map and
> > such). So if the RC is initiating the MWd TLP with NO_SNOOP=1, then
> > there would be coherency issues because there is no guarantee that
> > the memory is not cached on the endpoint. So, not snooping the
> > caches and directly writing to the DDR would cause coherency issues
> > on the endpoint as well.
> 
> I don't know what linear map is, but I'll take your word for it that
> endpoints are allowed to cache things internally.  So I guess in the
> ideal world there might be a way for a driver to specify no-snoop for
> accesses to its device if it knows there is no caching.
> 

I referred to Linux kernel's mapping of the DDR space as "linear map". But the
endpoint may not run only Linux, but any RTOS or even bare metal. So it is
certainly possible the BAR memory could be cached.

> The commit log for this patch refers to caching on the *host*, though,
> and IIUC you're saying this patch clears NO_SNOOP on TLPs from the RC
> because of potential coherency issues on the *endpoint*.
> 

Yeah, the commit message was wrong. I shared the wording during the review of
previous version and it got duplicated for both RC and EP patches :(

This should be fixed.

- Mani
Manivannan Sadhasivam March 4, 2024, 6:07 a.m. UTC | #9
On Fri, Feb 23, 2024 at 07:33:38PM +0530, Mrinmay Sarkar wrote:

Subject should be:

"PCI: qcom: Override NO_SNOOP attribute for SA8775P"

> Due to some hardware changes, SA8775P has set the NO_SNOOP attribute
> in its TLP for all the PCIe controllers. NO_SNOOP attribute when set,
> the requester is indicating that there no cache coherency issues exit
> for the addressed memory on the host i.e., memory is not cached. But

s/host/endpoint

> in reality, requester cannot assume this unless there is a complete
> control/visibility over the addressed memory on the host.
> 

s/host/endpoint

> And worst case, if the memory is cached on the host, it may lead to

s/host/endpoint

> memory corruption issues. It should be noted that the caching of memory
> on the host is not solely dependent on the NO_SNOOP attribute in TLP.
> 

s/host/endpoint

> So to avoid the corruption, this patch overrides the NO_SNOOP attribute
> by setting the PCIE_PARF_NO_SNOOP_OVERIDE register. This patch is not
> needed for other upstream supported platforms since they do not set
> NO_SNOOP attribute by default.
> 
> 8775 has IP version 1.34.0 so intruduce a new cfg(cfg_1_34_0) for this
> platform. Assign enable_cache_snoop flag into struct qcom_pcie_cfg and
> set it true in cfg_1_34_0 and enable cache snooping if this particular
> flag is true.
> 
> Signed-off-by: Mrinmay Sarkar <quic_msarkar@quicinc.com>
> ---
>  drivers/pci/controller/dwc/pcie-qcom.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
> index 2ce2a3bd932b..872be7f7d7b3 100644
> --- a/drivers/pci/controller/dwc/pcie-qcom.c
> +++ b/drivers/pci/controller/dwc/pcie-qcom.c
> @@ -51,6 +51,7 @@
>  #define PARF_SID_OFFSET				0x234
>  #define PARF_BDF_TRANSLATE_CFG			0x24c
>  #define PARF_SLV_ADDR_SPACE_SIZE		0x358
> +#define PARF_NO_SNOOP_OVERIDE			0x3d4
>  #define PARF_DEVICE_TYPE			0x1000
>  #define PARF_BDF_TO_SID_TABLE_N			0x2000
>  
> @@ -117,6 +118,10 @@
>  /* PARF_LTSSM register fields */
>  #define LTSSM_EN				BIT(8)
>  
> +/* PARF_NO_SNOOP_OVERIDE register fields */
> +#define WR_NO_SNOOP_OVERIDE_EN			BIT(1)
> +#define RD_NO_SNOOP_OVERIDE_EN			BIT(3)
> +
>  /* PARF_DEVICE_TYPE register fields */
>  #define DEVICE_TYPE_RC				0x4
>  
> @@ -229,6 +234,7 @@ struct qcom_pcie_ops {
>  

Please add Kdoc comments for this struct. And describe the "override_no_snoop"
member as below:

"Override NO_SNOOP attribute in TLP to enable cache snooping"

>  struct qcom_pcie_cfg {
>  	const struct qcom_pcie_ops *ops;
> +	bool enable_cache_snoop;

Rename this to "override_no_snoop"

>  };
>  
>  struct qcom_pcie {
> @@ -961,6 +967,13 @@ static int qcom_pcie_init_2_7_0(struct qcom_pcie *pcie)
>  
>  static int qcom_pcie_post_init_2_7_0(struct qcom_pcie *pcie)
>  {
> +	const struct qcom_pcie_cfg *pcie_cfg = pcie->cfg;
> +
> +	/* Enable cache snooping for SA8775P */

Remove this comment in favor of Kdoc mentioned above.

- Mani
diff mbox series

Patch

diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index 2ce2a3bd932b..872be7f7d7b3 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -51,6 +51,7 @@ 
 #define PARF_SID_OFFSET				0x234
 #define PARF_BDF_TRANSLATE_CFG			0x24c
 #define PARF_SLV_ADDR_SPACE_SIZE		0x358
+#define PARF_NO_SNOOP_OVERIDE			0x3d4
 #define PARF_DEVICE_TYPE			0x1000
 #define PARF_BDF_TO_SID_TABLE_N			0x2000
 
@@ -117,6 +118,10 @@ 
 /* PARF_LTSSM register fields */
 #define LTSSM_EN				BIT(8)
 
+/* PARF_NO_SNOOP_OVERIDE register fields */
+#define WR_NO_SNOOP_OVERIDE_EN			BIT(1)
+#define RD_NO_SNOOP_OVERIDE_EN			BIT(3)
+
 /* PARF_DEVICE_TYPE register fields */
 #define DEVICE_TYPE_RC				0x4
 
@@ -229,6 +234,7 @@  struct qcom_pcie_ops {
 
 struct qcom_pcie_cfg {
 	const struct qcom_pcie_ops *ops;
+	bool enable_cache_snoop;
 };
 
 struct qcom_pcie {
@@ -961,6 +967,13 @@  static int qcom_pcie_init_2_7_0(struct qcom_pcie *pcie)
 
 static int qcom_pcie_post_init_2_7_0(struct qcom_pcie *pcie)
 {
+	const struct qcom_pcie_cfg *pcie_cfg = pcie->cfg;
+
+	/* Enable cache snooping for SA8775P */
+	if (pcie_cfg->enable_cache_snoop)
+		writel(WR_NO_SNOOP_OVERIDE_EN | RD_NO_SNOOP_OVERIDE_EN,
+				pcie->parf + PARF_NO_SNOOP_OVERIDE);
+
 	qcom_pcie_clear_hpc(pcie->pci);
 
 	return 0;
@@ -1334,6 +1347,11 @@  static const struct qcom_pcie_cfg cfg_1_9_0 = {
 	.ops = &ops_1_9_0,
 };
 
+static const struct qcom_pcie_cfg cfg_1_34_0 = {
+	.ops = &ops_1_9_0,
+	.enable_cache_snoop = true,
+};
+
 static const struct qcom_pcie_cfg cfg_2_1_0 = {
 	.ops = &ops_2_1_0,
 };
@@ -1630,7 +1648,7 @@  static const struct of_device_id qcom_pcie_match[] = {
 	{ .compatible = "qcom,pcie-msm8996", .data = &cfg_2_3_2 },
 	{ .compatible = "qcom,pcie-qcs404", .data = &cfg_2_4_0 },
 	{ .compatible = "qcom,pcie-sa8540p", .data = &cfg_1_9_0 },
-	{ .compatible = "qcom,pcie-sa8775p", .data = &cfg_1_9_0},
+	{ .compatible = "qcom,pcie-sa8775p", .data = &cfg_1_34_0},
 	{ .compatible = "qcom,pcie-sc7280", .data = &cfg_1_9_0 },
 	{ .compatible = "qcom,pcie-sc8180x", .data = &cfg_1_9_0 },
 	{ .compatible = "qcom,pcie-sc8280xp", .data = &cfg_1_9_0 },