Message ID | 1517999482-17317-7-git-send-email-vivek.gautam@codeaurora.org (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Hi Vivek, Thanks for the patch. Please see my comments inline. On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam <vivek.gautam@codeaurora.org> wrote: > While handling the concerned iommu, there should not be a > need to power control the drm devices from iommu interface. > If these drm devices need to be powered around this time, > the respective drivers should take care of this. > > Replace the pm_runtime_get/put_sync(<drm_device>) with > pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up > the connected iommu through the device link interface. > In case the device link is not setup these get/put_suppliers() > calls will be a no-op, and the iommu driver should take care of > powering on its devices accordingly. > > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> > --- > drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c > index b23d33622f37..1ab629bbee69 100644 > --- a/drivers/gpu/drm/msm/msm_iommu.c > +++ b/drivers/gpu/drm/msm/msm_iommu.c > @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, > struct msm_iommu *iommu = to_msm_iommu(mmu); > int ret; > > - pm_runtime_get_sync(mmu->dev); > + pm_runtime_get_suppliers(mmu->dev); > ret = iommu_attach_device(iommu->domain, mmu->dev); > - pm_runtime_put_sync(mmu->dev); > + pm_runtime_put_suppliers(mmu->dev); For me, it looks like a wrong place to handle runtime PM of IOMMU here. iommu_attach_device() calls into IOMMU driver's attach_device() callback and that's where necessary runtime PM gets should happen, if any. In other words, driver A (MSM DRM driver) shouldn't be dealing with power state of device controlled by driver B (ARM SMMU). This is also important for the reasons I stated in my comments to "[PATCH v7 1/6] base: power: runtime: Export pm_runtime_get/put_suppliers". Quoting for everyone's convenience: >> There are however cases in which the consumer wants to power-on >> the supplier, but not itself. >> E.g., A Graphics or multimedia driver wants to power-on the SMMU >> to unmap a buffer and finish the TLB operations without powering >> on itself. > >This sounds strange to me. If the SMMU is powered down, wouldn't the >TLB lose its contents as well (and so no flushing needed)? > >Other than that, what kind of hardware operations would be needed >besides just updating the page tables from the CPU? > In other words, the SMMU driver can deal with hardware state based on return value of pm_runtime_get_sync() or pm_runtime_get_if_in_use() and decide whether some operations are necessary or not, e.g. - a state restore is necessary if the domain was powered off, but we are bringing the master on, - a flush may not be required when (un)mapping with the domain powered off, - etc. Best regards, Tomasz
On Tue, Feb 13, 2018 at 06:10:38PM +0900, Tomasz Figa wrote: > Hi Vivek, > > Thanks for the patch. Please see my comments inline. > > On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam > <vivek.gautam@codeaurora.org> wrote: > > While handling the concerned iommu, there should not be a > > need to power control the drm devices from iommu interface. > > If these drm devices need to be powered around this time, > > the respective drivers should take care of this. > > > > Replace the pm_runtime_get/put_sync(<drm_device>) with > > pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up > > the connected iommu through the device link interface. > > In case the device link is not setup these get/put_suppliers() > > calls will be a no-op, and the iommu driver should take care of > > powering on its devices accordingly. > > > > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> > > --- > > drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- > > 1 file changed, 8 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c > > index b23d33622f37..1ab629bbee69 100644 > > --- a/drivers/gpu/drm/msm/msm_iommu.c > > +++ b/drivers/gpu/drm/msm/msm_iommu.c > > @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, > > struct msm_iommu *iommu = to_msm_iommu(mmu); > > int ret; > > > > - pm_runtime_get_sync(mmu->dev); > > + pm_runtime_get_suppliers(mmu->dev); > > ret = iommu_attach_device(iommu->domain, mmu->dev); > > - pm_runtime_put_sync(mmu->dev); > > + pm_runtime_put_suppliers(mmu->dev); > > For me, it looks like a wrong place to handle runtime PM of IOMMU > here. iommu_attach_device() calls into IOMMU driver's attach_device() > callback and that's where necessary runtime PM gets should happen, if > any. In other words, driver A (MSM DRM driver) shouldn't be dealing > with power state of device controlled by driver B (ARM SMMU). This whole thing is confused by the fact that on MSM the GPU and the GPU IOMMU share some of the same clocks and power rail so turning on the GPU also turned on the IOMMU register banks by extension. But if we put that aside the question is who should be responsible for controlling the power in this relationship and there are several good reasons to leave it up to the client device. The most important reason is when we move to the per-instance model where the GPU self-programmings the SMMU registers. In that case, the driver will need to make sure that the SMMU is powered up before submitting the command and then removing the power vote when the commands are done to save energy. Additionally, there might be legitimate reasons in the driver to batch operations - you may wish to attach the device and then map several global buffers immediately - having driver side control prevents several unneeded power transitions. Perhaps the right answer is to do both - allow for the driver to enable the supplier but also do the right power operations at the appropriately places in the IOMMU driver. > This is also important for the reasons I stated in my comments to > "[PATCH v7 1/6] base: power: runtime: Export > pm_runtime_get/put_suppliers". Quoting for everyone's convenience: > > >> There are however cases in which the consumer wants to power-on > >> the supplier, but not itself. > >> E.g., A Graphics or multimedia driver wants to power-on the SMMU > >> to unmap a buffer and finish the TLB operations without powering > >> on itself. > > > >This sounds strange to me. If the SMMU is powered down, wouldn't the > >TLB lose its contents as well (and so no flushing needed)? > > > >Other than that, what kind of hardware operations would be needed > >besides just updating the page tables from the CPU? > > > In other words, the SMMU driver can deal with hardware state based on > return value of pm_runtime_get_sync() or pm_runtime_get_if_in_use() > and decide whether some operations are necessary or not, e.g. > - a state restore is necessary if the domain was powered off, but we > are bringing the master on, > - a flush may not be required when (un)mapping with the domain powered off, > - etc. I agree that there is probably some advanced logic that we can do to conclusively figure out the state of the hardware and improve the behavior. I would love to see the SMMU driver get smarter but for the moment we can't trust it and so we need to force the behavior from the GPU driver. The current code works for a5x and earlier but on sdm845 we can (no longer) treat the GPU and the SMMU as the same device for power purposes so we need this code. If at some point in the future we can start to selectively remove the supplier calls I wouldn't mind one bit. Jordan
On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa <tfiga@chromium.org> wrote: > Hi Vivek, > > Thanks for the patch. Please see my comments inline. > > On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam > <vivek.gautam@codeaurora.org> wrote: >> While handling the concerned iommu, there should not be a >> need to power control the drm devices from iommu interface. >> If these drm devices need to be powered around this time, >> the respective drivers should take care of this. >> >> Replace the pm_runtime_get/put_sync(<drm_device>) with >> pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up >> the connected iommu through the device link interface. >> In case the device link is not setup these get/put_suppliers() >> calls will be a no-op, and the iommu driver should take care of >> powering on its devices accordingly. >> >> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >> --- >> drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- >> 1 file changed, 8 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c >> index b23d33622f37..1ab629bbee69 100644 >> --- a/drivers/gpu/drm/msm/msm_iommu.c >> +++ b/drivers/gpu/drm/msm/msm_iommu.c >> @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, >> struct msm_iommu *iommu = to_msm_iommu(mmu); >> int ret; >> >> - pm_runtime_get_sync(mmu->dev); >> + pm_runtime_get_suppliers(mmu->dev); >> ret = iommu_attach_device(iommu->domain, mmu->dev); >> - pm_runtime_put_sync(mmu->dev); >> + pm_runtime_put_suppliers(mmu->dev); > > For me, it looks like a wrong place to handle runtime PM of IOMMU > here. iommu_attach_device() calls into IOMMU driver's attach_device() > callback and that's where necessary runtime PM gets should happen, if > any. In other words, driver A (MSM DRM driver) shouldn't be dealing > with power state of device controlled by driver B (ARM SMMU). Note that we end up having to do the same, because of iommu_unmap() while DRM driver is powered off.. it might be cleaner if it was all self contained in the iommu driver, but that would make it so other drivers couldn't call iommu_unmap() from an irq handler, which is apparently something that some of them want to do.. So I'm happy with the pm_runtime_get/put_suppliers() approach as a reasonable compromise. (Perhaps specifically, attach/detach this could move inside the iommu driver, but we still need to get/put_suppliers() for unmap(), so meh) BR, -R > This is also important for the reasons I stated in my comments to > "[PATCH v7 1/6] base: power: runtime: Export > pm_runtime_get/put_suppliers". Quoting for everyone's convenience: > >>> There are however cases in which the consumer wants to power-on >>> the supplier, but not itself. >>> E.g., A Graphics or multimedia driver wants to power-on the SMMU >>> to unmap a buffer and finish the TLB operations without powering >>> on itself. >> >>This sounds strange to me. If the SMMU is powered down, wouldn't the >>TLB lose its contents as well (and so no flushing needed)? >> >>Other than that, what kind of hardware operations would be needed >>besides just updating the page tables from the CPU? >> > > In other words, the SMMU driver can deal with hardware state based on > return value of pm_runtime_get_sync() or pm_runtime_get_if_in_use() > and decide whether some operations are necessary or not, e.g. > - a state restore is necessary if the domain was powered off, but we > are bringing the master on, > - a flush may not be required when (un)mapping with the domain powered off, > - etc. > > Best regards, > Tomasz
On Wed, Feb 14, 2018 at 3:03 AM, Rob Clark <robdclark@gmail.com> wrote: > On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa <tfiga@chromium.org> wrote: >> Hi Vivek, >> >> Thanks for the patch. Please see my comments inline. >> >> On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam >> <vivek.gautam@codeaurora.org> wrote: >>> While handling the concerned iommu, there should not be a >>> need to power control the drm devices from iommu interface. >>> If these drm devices need to be powered around this time, >>> the respective drivers should take care of this. >>> >>> Replace the pm_runtime_get/put_sync(<drm_device>) with >>> pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up >>> the connected iommu through the device link interface. >>> In case the device link is not setup these get/put_suppliers() >>> calls will be a no-op, and the iommu driver should take care of >>> powering on its devices accordingly. >>> >>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >>> --- >>> drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- >>> 1 file changed, 8 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c >>> index b23d33622f37..1ab629bbee69 100644 >>> --- a/drivers/gpu/drm/msm/msm_iommu.c >>> +++ b/drivers/gpu/drm/msm/msm_iommu.c >>> @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, >>> struct msm_iommu *iommu = to_msm_iommu(mmu); >>> int ret; >>> >>> - pm_runtime_get_sync(mmu->dev); >>> + pm_runtime_get_suppliers(mmu->dev); >>> ret = iommu_attach_device(iommu->domain, mmu->dev); >>> - pm_runtime_put_sync(mmu->dev); >>> + pm_runtime_put_suppliers(mmu->dev); >> >> For me, it looks like a wrong place to handle runtime PM of IOMMU >> here. iommu_attach_device() calls into IOMMU driver's attach_device() >> callback and that's where necessary runtime PM gets should happen, if >> any. In other words, driver A (MSM DRM driver) shouldn't be dealing >> with power state of device controlled by driver B (ARM SMMU). > > Note that we end up having to do the same, because of iommu_unmap() > while DRM driver is powered off.. it might be cleaner if it was all > self contained in the iommu driver, but that would make it so other > drivers couldn't call iommu_unmap() from an irq handler, which is > apparently something that some of them want to do.. I'd assume that runtime PM status is already guaranteed to be active when the IRQ handler is running, by some other means (e.g. pm_runtime_get_sync() called earlier, when queuing some work to the hardware). Otherwise, I'm not sure how a powered down device could trigger an IRQ. So, if the master device power is already on, suppliers should be powered on as well, thanks to device links. Best regards, Tomasz
On Tue, Feb 13, 2018 at 8:59 PM, Tomasz Figa <tfiga@chromium.org> wrote: > On Wed, Feb 14, 2018 at 3:03 AM, Rob Clark <robdclark@gmail.com> wrote: >> On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa <tfiga@chromium.org> wrote: >>> Hi Vivek, >>> >>> Thanks for the patch. Please see my comments inline. >>> >>> On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam >>> <vivek.gautam@codeaurora.org> wrote: >>>> While handling the concerned iommu, there should not be a >>>> need to power control the drm devices from iommu interface. >>>> If these drm devices need to be powered around this time, >>>> the respective drivers should take care of this. >>>> >>>> Replace the pm_runtime_get/put_sync(<drm_device>) with >>>> pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up >>>> the connected iommu through the device link interface. >>>> In case the device link is not setup these get/put_suppliers() >>>> calls will be a no-op, and the iommu driver should take care of >>>> powering on its devices accordingly. >>>> >>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >>>> --- >>>> drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- >>>> 1 file changed, 8 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c >>>> index b23d33622f37..1ab629bbee69 100644 >>>> --- a/drivers/gpu/drm/msm/msm_iommu.c >>>> +++ b/drivers/gpu/drm/msm/msm_iommu.c >>>> @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, >>>> struct msm_iommu *iommu = to_msm_iommu(mmu); >>>> int ret; >>>> >>>> - pm_runtime_get_sync(mmu->dev); >>>> + pm_runtime_get_suppliers(mmu->dev); >>>> ret = iommu_attach_device(iommu->domain, mmu->dev); >>>> - pm_runtime_put_sync(mmu->dev); >>>> + pm_runtime_put_suppliers(mmu->dev); >>> >>> For me, it looks like a wrong place to handle runtime PM of IOMMU >>> here. iommu_attach_device() calls into IOMMU driver's attach_device() >>> callback and that's where necessary runtime PM gets should happen, if >>> any. In other words, driver A (MSM DRM driver) shouldn't be dealing >>> with power state of device controlled by driver B (ARM SMMU). >> >> Note that we end up having to do the same, because of iommu_unmap() >> while DRM driver is powered off.. it might be cleaner if it was all >> self contained in the iommu driver, but that would make it so other >> drivers couldn't call iommu_unmap() from an irq handler, which is >> apparently something that some of them want to do.. > > I'd assume that runtime PM status is already guaranteed to be active > when the IRQ handler is running, by some other means (e.g. > pm_runtime_get_sync() called earlier, when queuing some work to the > hardware). Otherwise, I'm not sure how a powered down device could > trigger an IRQ. > > So, if the master device power is already on, suppliers should be > powered on as well, thanks to device links. > umm, that is kindof the inverse of the problem.. the problem is things like gpu driver (and v4l2 drivers that import dma-buf's, afaict).. they will potentially call iommu->unmap() when device is not active (due to userspace or things beyond the control of the driver).. so *they* would want iommu to do pm get/put calls. But other drivers trying to unmap from irq ctx would not. Which is the contradictory requirement that lead to the idea of iommu user powering up iommu for unmap. There has already been some discussion about this on various earlier permutations of this patchset. I think we have exhausted all other options. BR, -R
On Wed, Feb 14, 2018 at 11:13 AM, Rob Clark <robdclark@gmail.com> wrote: > On Tue, Feb 13, 2018 at 8:59 PM, Tomasz Figa <tfiga@chromium.org> wrote: >> On Wed, Feb 14, 2018 at 3:03 AM, Rob Clark <robdclark@gmail.com> wrote: >>> On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa <tfiga@chromium.org> wrote: >>>> Hi Vivek, >>>> >>>> Thanks for the patch. Please see my comments inline. >>>> >>>> On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam >>>> <vivek.gautam@codeaurora.org> wrote: >>>>> While handling the concerned iommu, there should not be a >>>>> need to power control the drm devices from iommu interface. >>>>> If these drm devices need to be powered around this time, >>>>> the respective drivers should take care of this. >>>>> >>>>> Replace the pm_runtime_get/put_sync(<drm_device>) with >>>>> pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up >>>>> the connected iommu through the device link interface. >>>>> In case the device link is not setup these get/put_suppliers() >>>>> calls will be a no-op, and the iommu driver should take care of >>>>> powering on its devices accordingly. >>>>> >>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >>>>> --- >>>>> drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- >>>>> 1 file changed, 8 insertions(+), 8 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c >>>>> index b23d33622f37..1ab629bbee69 100644 >>>>> --- a/drivers/gpu/drm/msm/msm_iommu.c >>>>> +++ b/drivers/gpu/drm/msm/msm_iommu.c >>>>> @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, >>>>> struct msm_iommu *iommu = to_msm_iommu(mmu); >>>>> int ret; >>>>> >>>>> - pm_runtime_get_sync(mmu->dev); >>>>> + pm_runtime_get_suppliers(mmu->dev); >>>>> ret = iommu_attach_device(iommu->domain, mmu->dev); >>>>> - pm_runtime_put_sync(mmu->dev); >>>>> + pm_runtime_put_suppliers(mmu->dev); >>>> >>>> For me, it looks like a wrong place to handle runtime PM of IOMMU >>>> here. iommu_attach_device() calls into IOMMU driver's attach_device() >>>> callback and that's where necessary runtime PM gets should happen, if >>>> any. In other words, driver A (MSM DRM driver) shouldn't be dealing >>>> with power state of device controlled by driver B (ARM SMMU). >>> >>> Note that we end up having to do the same, because of iommu_unmap() >>> while DRM driver is powered off.. it might be cleaner if it was all >>> self contained in the iommu driver, but that would make it so other >>> drivers couldn't call iommu_unmap() from an irq handler, which is >>> apparently something that some of them want to do.. >> >> I'd assume that runtime PM status is already guaranteed to be active >> when the IRQ handler is running, by some other means (e.g. >> pm_runtime_get_sync() called earlier, when queuing some work to the >> hardware). Otherwise, I'm not sure how a powered down device could >> trigger an IRQ. >> >> So, if the master device power is already on, suppliers should be >> powered on as well, thanks to device links. >> > > umm, that is kindof the inverse of the problem.. the problem is > things like gpu driver (and v4l2 drivers that import dma-buf's, > afaict).. they will potentially call iommu->unmap() when device is not > active (due to userspace or things beyond the control of the driver).. > so *they* would want iommu to do pm get/put calls. Which is fine and which is actually already done by one of the patches in this series, not for map/unmap, but probe, add_device, remove_device. Having parts of the API doing it inside the callback and other parts outside sounds at least inconsistent. > But other drivers > trying to unmap from irq ctx would not. Which is the contradictory > requirement that lead to the idea of iommu user powering up iommu for > unmap. Sorry, maybe I wasn't clear. My last message was supposed to show that it's not contradictory at all, because "other drivers trying to unmap from irq ctx" would already have called pm_runtime_get_*() earlier from a non-irq ctx, which would have also done the same on all the linked suppliers, including the IOMMU. The ultimate result would be that the map/unmap() of the IOMMU driver calling pm_runtime_get_sync() would do nothing besides incrementing the reference count. > > There has already been some discussion about this on various earlier > permutations of this patchset. I think we have exhausted all other > options. I guess I should have read those. Let me do that now. Best regards, Tomasz
Hi Jordan, On Wed, Feb 14, 2018 at 1:42 AM, Jordan Crouse <jcrouse@codeaurora.org> wrote: > On Tue, Feb 13, 2018 at 06:10:38PM +0900, Tomasz Figa wrote: >> Hi Vivek, >> >> Thanks for the patch. Please see my comments inline. >> >> On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam >> <vivek.gautam@codeaurora.org> wrote: >> > While handling the concerned iommu, there should not be a >> > need to power control the drm devices from iommu interface. >> > If these drm devices need to be powered around this time, >> > the respective drivers should take care of this. >> > >> > Replace the pm_runtime_get/put_sync(<drm_device>) with >> > pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up >> > the connected iommu through the device link interface. >> > In case the device link is not setup these get/put_suppliers() >> > calls will be a no-op, and the iommu driver should take care of >> > powering on its devices accordingly. >> > >> > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >> > --- >> > drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- >> > 1 file changed, 8 insertions(+), 8 deletions(-) >> > >> > diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c >> > index b23d33622f37..1ab629bbee69 100644 >> > --- a/drivers/gpu/drm/msm/msm_iommu.c >> > +++ b/drivers/gpu/drm/msm/msm_iommu.c >> > @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, >> > struct msm_iommu *iommu = to_msm_iommu(mmu); >> > int ret; >> > >> > - pm_runtime_get_sync(mmu->dev); >> > + pm_runtime_get_suppliers(mmu->dev); >> > ret = iommu_attach_device(iommu->domain, mmu->dev); >> > - pm_runtime_put_sync(mmu->dev); >> > + pm_runtime_put_suppliers(mmu->dev); >> >> For me, it looks like a wrong place to handle runtime PM of IOMMU >> here. iommu_attach_device() calls into IOMMU driver's attach_device() >> callback and that's where necessary runtime PM gets should happen, if >> any. In other words, driver A (MSM DRM driver) shouldn't be dealing >> with power state of device controlled by driver B (ARM SMMU). > > This whole thing is confused by the fact that on MSM the GPU and the GPU IOMMU > share some of the same clocks and power rail so turning on the GPU also > turned on the IOMMU register banks by extension. This is surprisingly not a very surprising case. Exactly the same can be seen on Rockchip SoCs and we're solving the problem using the solution I suggested. In fact, my suggestions to this thread are based on the design we chose for Rockchip, due to the high level of similarity (+/- the GPU directly programming IOMMU registers, which is not present there, but AFAICT it doesn't pose a problem here). > > But if we put that aside the question is who should be responsible for > controlling the power in this relationship and there are several good reasons to > leave it up to the client device. The most important reason is when we move to > the per-instance model where the GPU self-programmings the SMMU registers. In > that case, the driver will need to make sure that the SMMU is powered up before > submitting the command and then removing the power vote when the commands > are done to save energy. I might need more insight on what's going on in your hardware, but with my current understanding I'd argue that that is not right, because: - When submitting commands to the GPU, the GPU driver will pm_runtime_get_sync() on the GPU device, which will automatically do the same on all the linked suppliers, which would also include the SMMU itself. The role of device links here is exactly that the GPU driver doesn't have to care which other devices need to be brought up. - When the GPU is operating, the SMMU power must be supplied anyway, because it needs to be doing the translations, right? Note that by "power" I really mean the physical power supply in the SoC, e.g. as for a power domain. The runtime PM API in its current form (e.g. binary off or on operation) is unsuitable for managing other things, such as clocks (and there is ongoing work on improving it, e.g. by adding support for multiple power states). ^^ The above would be actually guaranteed by your hardware design, where SMMU and GPU share the power domain and clocks. (We used to rely on this in old downstream implementation of Rockchip IOMMU and master drivers in Chromium OS kernel, before we moved to handling the clocks explicitly in the IOMMU driver and properly using device links to manage the power domain and state restoration.) > > Additionally, there might be legitimate reasons in the driver to batch > operations - you may wish to attach the device and then map several global > buffers immediately - having driver side control prevents several unneeded power > transitions. As I mentioned before, these operations wouldn't normally need any power transitions, since mapping with the TLB powered down boils down to just updating the page tables in memory. However, as Robin mentioned before, there might be some hardware factors, such as TLB being powered separately (or retaining contents in some other way), where this wouldn't be ensured indeed. Still, that's where runtime PM autosuspend feature (i.e. delayed suspend) comes to the rescue, with the advantage of handling the cases when the master driver receives map/unmap requests not batched (but maybe a slight drawback in terms of the suspend not happening instantly and losing some power, but it's about power domains, so mainly leakage current, isn't it?) > > Perhaps the right answer is to do both - allow for the driver to enable the > supplier but also do the right power operations at the appropriately places in > the IOMMU driver. > >> This is also important for the reasons I stated in my comments to >> "[PATCH v7 1/6] base: power: runtime: Export >> pm_runtime_get/put_suppliers". Quoting for everyone's convenience: >> >> >> There are however cases in which the consumer wants to power-on >> >> the supplier, but not itself. >> >> E.g., A Graphics or multimedia driver wants to power-on the SMMU >> >> to unmap a buffer and finish the TLB operations without powering >> >> on itself. >> > >> >This sounds strange to me. If the SMMU is powered down, wouldn't the >> >TLB lose its contents as well (and so no flushing needed)? >> > > >> >Other than that, what kind of hardware operations would be needed >> >besides just updating the page tables from the CPU? >> > > >> In other words, the SMMU driver can deal with hardware state based on >> return value of pm_runtime_get_sync() or pm_runtime_get_if_in_use() >> and decide whether some operations are necessary or not, e.g. >> - a state restore is necessary if the domain was powered off, but we >> are bringing the master on, >> - a flush may not be required when (un)mapping with the domain powered off, >> - etc. > > I agree that there is probably some advanced logic that we can do to > conclusively figure out the state of the hardware and improve the behavior. > I would love to see the SMMU driver get smarter but for the moment we can't > trust it and so we need to force the behavior from the GPU driver. The current > code works for a5x and earlier but on sdm845 we can (no longer) treat the GPU > and the SMMU as the same device for power purposes so we need this code. Hmm, you've lost me there. Above you mention that "on MSM the GPU and the GPU IOMMU share some of the same clocks and power rail". Is this no longer the case for sdm845? If so, would you mind shedding a bit more light on how this looks there? Best regards, Tomasz
Hi Tomasz, On Wed, Feb 14, 2018 at 8:31 AM, Tomasz Figa <tfiga@chromium.org> wrote: > On Wed, Feb 14, 2018 at 11:13 AM, Rob Clark <robdclark@gmail.com> wrote: >> On Tue, Feb 13, 2018 at 8:59 PM, Tomasz Figa <tfiga@chromium.org> wrote: >>> On Wed, Feb 14, 2018 at 3:03 AM, Rob Clark <robdclark@gmail.com> wrote: >>>> On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa <tfiga@chromium.org> wrote: >>>>> Hi Vivek, >>>>> >>>>> Thanks for the patch. Please see my comments inline. >>>>> >>>>> On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam >>>>> <vivek.gautam@codeaurora.org> wrote: >>>>>> While handling the concerned iommu, there should not be a >>>>>> need to power control the drm devices from iommu interface. >>>>>> If these drm devices need to be powered around this time, >>>>>> the respective drivers should take care of this. >>>>>> >>>>>> Replace the pm_runtime_get/put_sync(<drm_device>) with >>>>>> pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up >>>>>> the connected iommu through the device link interface. >>>>>> In case the device link is not setup these get/put_suppliers() >>>>>> calls will be a no-op, and the iommu driver should take care of >>>>>> powering on its devices accordingly. >>>>>> >>>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >>>>>> --- >>>>>> drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- >>>>>> 1 file changed, 8 insertions(+), 8 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c >>>>>> index b23d33622f37..1ab629bbee69 100644 >>>>>> --- a/drivers/gpu/drm/msm/msm_iommu.c >>>>>> +++ b/drivers/gpu/drm/msm/msm_iommu.c >>>>>> @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, >>>>>> struct msm_iommu *iommu = to_msm_iommu(mmu); >>>>>> int ret; >>>>>> >>>>>> - pm_runtime_get_sync(mmu->dev); >>>>>> + pm_runtime_get_suppliers(mmu->dev); >>>>>> ret = iommu_attach_device(iommu->domain, mmu->dev); >>>>>> - pm_runtime_put_sync(mmu->dev); >>>>>> + pm_runtime_put_suppliers(mmu->dev); >>>>> >>>>> For me, it looks like a wrong place to handle runtime PM of IOMMU >>>>> here. iommu_attach_device() calls into IOMMU driver's attach_device() >>>>> callback and that's where necessary runtime PM gets should happen, if >>>>> any. In other words, driver A (MSM DRM driver) shouldn't be dealing >>>>> with power state of device controlled by driver B (ARM SMMU). >>>> >>>> Note that we end up having to do the same, because of iommu_unmap() >>>> while DRM driver is powered off.. it might be cleaner if it was all >>>> self contained in the iommu driver, but that would make it so other >>>> drivers couldn't call iommu_unmap() from an irq handler, which is >>>> apparently something that some of them want to do.. >>> >>> I'd assume that runtime PM status is already guaranteed to be active >>> when the IRQ handler is running, by some other means (e.g. >>> pm_runtime_get_sync() called earlier, when queuing some work to the >>> hardware). Otherwise, I'm not sure how a powered down device could >>> trigger an IRQ. >>> >>> So, if the master device power is already on, suppliers should be >>> powered on as well, thanks to device links. >>> >> >> umm, that is kindof the inverse of the problem.. the problem is >> things like gpu driver (and v4l2 drivers that import dma-buf's, >> afaict).. they will potentially call iommu->unmap() when device is not >> active (due to userspace or things beyond the control of the driver).. >> so *they* would want iommu to do pm get/put calls. > > Which is fine and which is actually already done by one of the patches > in this series, not for map/unmap, but probe, add_device, > remove_device. Having parts of the API doing it inside the callback > and other parts outside sounds at least inconsistent. > >> But other drivers >> trying to unmap from irq ctx would not. Which is the contradictory >> requirement that lead to the idea of iommu user powering up iommu for >> unmap. > > Sorry, maybe I wasn't clear. My last message was supposed to show that > it's not contradictory at all, because "other drivers trying to unmap > from irq ctx" would already have called pm_runtime_get_*() earlier > from a non-irq ctx, which would have also done the same on all the > linked suppliers, including the IOMMU. The ultimate result would be > that the map/unmap() of the IOMMU driver calling pm_runtime_get_sync() > would do nothing besides incrementing the reference count. The entire point was to avoid the slowpath that pm_runtime_get/put_sync() would add in map/unmap. It would not be correct to add a slowpath in irq_ctx for taking care of non-irq_ctx and for the situations where master is already powered-off. > >> >> There has already been some discussion about this on various earlier >> permutations of this patchset. I think we have exhausted all other >> options. > > I guess I should have read those. Let me do that now. Yea, i point to the thread in cover letter and [PATCH 1/6]. Thanks. regards Vivek > > Best regards, > Tomasz > -- > To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 14, 2018 at 1:17 PM, Vivek Gautam <vivek.gautam@codeaurora.org> wrote: > Hi Tomasz, > > On Wed, Feb 14, 2018 at 8:31 AM, Tomasz Figa <tfiga@chromium.org> wrote: >> On Wed, Feb 14, 2018 at 11:13 AM, Rob Clark <robdclark@gmail.com> wrote: >>> On Tue, Feb 13, 2018 at 8:59 PM, Tomasz Figa <tfiga@chromium.org> wrote: >>>> On Wed, Feb 14, 2018 at 3:03 AM, Rob Clark <robdclark@gmail.com> wrote: >>>>> On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa <tfiga@chromium.org> wrote: >>>>>> Hi Vivek, >>>>>> >>>>>> Thanks for the patch. Please see my comments inline. >>>>>> >>>>>> On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam >>>>>> <vivek.gautam@codeaurora.org> wrote: >>>>>>> While handling the concerned iommu, there should not be a >>>>>>> need to power control the drm devices from iommu interface. >>>>>>> If these drm devices need to be powered around this time, >>>>>>> the respective drivers should take care of this. >>>>>>> >>>>>>> Replace the pm_runtime_get/put_sync(<drm_device>) with >>>>>>> pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up >>>>>>> the connected iommu through the device link interface. >>>>>>> In case the device link is not setup these get/put_suppliers() >>>>>>> calls will be a no-op, and the iommu driver should take care of >>>>>>> powering on its devices accordingly. >>>>>>> >>>>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >>>>>>> --- >>>>>>> drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- >>>>>>> 1 file changed, 8 insertions(+), 8 deletions(-) >>>>>>> >>>>>>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c >>>>>>> index b23d33622f37..1ab629bbee69 100644 >>>>>>> --- a/drivers/gpu/drm/msm/msm_iommu.c >>>>>>> +++ b/drivers/gpu/drm/msm/msm_iommu.c >>>>>>> @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, >>>>>>> struct msm_iommu *iommu = to_msm_iommu(mmu); >>>>>>> int ret; >>>>>>> >>>>>>> - pm_runtime_get_sync(mmu->dev); >>>>>>> + pm_runtime_get_suppliers(mmu->dev); >>>>>>> ret = iommu_attach_device(iommu->domain, mmu->dev); >>>>>>> - pm_runtime_put_sync(mmu->dev); >>>>>>> + pm_runtime_put_suppliers(mmu->dev); >>>>>> >>>>>> For me, it looks like a wrong place to handle runtime PM of IOMMU >>>>>> here. iommu_attach_device() calls into IOMMU driver's attach_device() >>>>>> callback and that's where necessary runtime PM gets should happen, if >>>>>> any. In other words, driver A (MSM DRM driver) shouldn't be dealing >>>>>> with power state of device controlled by driver B (ARM SMMU). >>>>> >>>>> Note that we end up having to do the same, because of iommu_unmap() >>>>> while DRM driver is powered off.. it might be cleaner if it was all >>>>> self contained in the iommu driver, but that would make it so other >>>>> drivers couldn't call iommu_unmap() from an irq handler, which is >>>>> apparently something that some of them want to do.. >>>> >>>> I'd assume that runtime PM status is already guaranteed to be active >>>> when the IRQ handler is running, by some other means (e.g. >>>> pm_runtime_get_sync() called earlier, when queuing some work to the >>>> hardware). Otherwise, I'm not sure how a powered down device could >>>> trigger an IRQ. >>>> >>>> So, if the master device power is already on, suppliers should be >>>> powered on as well, thanks to device links. >>>> >>> >>> umm, that is kindof the inverse of the problem.. the problem is >>> things like gpu driver (and v4l2 drivers that import dma-buf's, >>> afaict).. they will potentially call iommu->unmap() when device is not >>> active (due to userspace or things beyond the control of the driver).. >>> so *they* would want iommu to do pm get/put calls. >> >> Which is fine and which is actually already done by one of the patches >> in this series, not for map/unmap, but probe, add_device, >> remove_device. Having parts of the API doing it inside the callback >> and other parts outside sounds at least inconsistent. >> >>> But other drivers >>> trying to unmap from irq ctx would not. Which is the contradictory >>> requirement that lead to the idea of iommu user powering up iommu for >>> unmap. >> >> Sorry, maybe I wasn't clear. My last message was supposed to show that >> it's not contradictory at all, because "other drivers trying to unmap >> from irq ctx" would already have called pm_runtime_get_*() earlier >> from a non-irq ctx, which would have also done the same on all the >> linked suppliers, including the IOMMU. The ultimate result would be >> that the map/unmap() of the IOMMU driver calling pm_runtime_get_sync() >> would do nothing besides incrementing the reference count. > > The entire point was to avoid the slowpath that pm_runtime_get/put_sync() > would add in map/unmap. It would not be correct to add a slowpath in irq_ctx > for taking care of non-irq_ctx and for the situations where master is already > powered-off. Correct me if I'm wrong, but I believe that with what I'm proposing there wouldn't be any slow path. a) For IRQ context, the master is already powered on and so the SMMU is also powered on, through respective device link. pm_runtime_get_sync() would ultimately just increment the runtime PM usage count. b) For a case when the master is already powered off (which wouldn't be IRQ context, for the reason stated in a)), powering on the SMMU is unavoidable, if the SMMU hardware really needs to be accessed (i.e. some TLBs need to be invalidated, if their state is preserved despite master being powered down). > >> >>> >>> There has already been some discussion about this on various earlier >>> permutations of this patchset. I think we have exhausted all other >>> options. >> >> I guess I should have read those. Let me do that now. > Yea, i point to the thread in cover letter and [PATCH 1/6]. > Thanks. I read through all the links in the cover letter and I could see other attempts not working out indeed, but they were different from what I'm proposing. There was also a point raised that __pm_runtime_resume() called from pm_runtime_get_sync() would grab dev->power_lock spinlock, which is true, except that if the device is already active, it would do it only for the time of checking device state, so I doubt it would really be a significant point of contention. Best regards, Tomasz
Hi Tomasz, On Wed, Feb 14, 2018 at 11:08 AM, Tomasz Figa <tfiga@chromium.org> wrote: > On Wed, Feb 14, 2018 at 1:17 PM, Vivek Gautam > <vivek.gautam@codeaurora.org> wrote: >> Hi Tomasz, >> >> On Wed, Feb 14, 2018 at 8:31 AM, Tomasz Figa <tfiga@chromium.org> wrote: >>> On Wed, Feb 14, 2018 at 11:13 AM, Rob Clark <robdclark@gmail.com> wrote: >>>> On Tue, Feb 13, 2018 at 8:59 PM, Tomasz Figa <tfiga@chromium.org> wrote: >>>>> On Wed, Feb 14, 2018 at 3:03 AM, Rob Clark <robdclark@gmail.com> wrote: >>>>>> On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa <tfiga@chromium.org> wrote: >>>>>>> Hi Vivek, >>>>>>> >>>>>>> Thanks for the patch. Please see my comments inline. >>>>>>> >>>>>>> On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam >>>>>>> <vivek.gautam@codeaurora.org> wrote: >>>>>>>> While handling the concerned iommu, there should not be a >>>>>>>> need to power control the drm devices from iommu interface. >>>>>>>> If these drm devices need to be powered around this time, >>>>>>>> the respective drivers should take care of this. >>>>>>>> >>>>>>>> Replace the pm_runtime_get/put_sync(<drm_device>) with >>>>>>>> pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up >>>>>>>> the connected iommu through the device link interface. >>>>>>>> In case the device link is not setup these get/put_suppliers() >>>>>>>> calls will be a no-op, and the iommu driver should take care of >>>>>>>> powering on its devices accordingly. >>>>>>>> >>>>>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >>>>>>>> --- >>>>>>>> drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- >>>>>>>> 1 file changed, 8 insertions(+), 8 deletions(-) >>>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>> index b23d33622f37..1ab629bbee69 100644 >>>>>>>> --- a/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>> +++ b/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>> @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, >>>>>>>> struct msm_iommu *iommu = to_msm_iommu(mmu); >>>>>>>> int ret; >>>>>>>> >>>>>>>> - pm_runtime_get_sync(mmu->dev); >>>>>>>> + pm_runtime_get_suppliers(mmu->dev); >>>>>>>> ret = iommu_attach_device(iommu->domain, mmu->dev); >>>>>>>> - pm_runtime_put_sync(mmu->dev); >>>>>>>> + pm_runtime_put_suppliers(mmu->dev); >>>>>>> >>>>>>> For me, it looks like a wrong place to handle runtime PM of IOMMU >>>>>>> here. iommu_attach_device() calls into IOMMU driver's attach_device() >>>>>>> callback and that's where necessary runtime PM gets should happen, if >>>>>>> any. In other words, driver A (MSM DRM driver) shouldn't be dealing >>>>>>> with power state of device controlled by driver B (ARM SMMU). >>>>>> >>>>>> Note that we end up having to do the same, because of iommu_unmap() >>>>>> while DRM driver is powered off.. it might be cleaner if it was all >>>>>> self contained in the iommu driver, but that would make it so other >>>>>> drivers couldn't call iommu_unmap() from an irq handler, which is >>>>>> apparently something that some of them want to do.. >>>>> >>>>> I'd assume that runtime PM status is already guaranteed to be active >>>>> when the IRQ handler is running, by some other means (e.g. >>>>> pm_runtime_get_sync() called earlier, when queuing some work to the >>>>> hardware). Otherwise, I'm not sure how a powered down device could >>>>> trigger an IRQ. >>>>> >>>>> So, if the master device power is already on, suppliers should be >>>>> powered on as well, thanks to device links. >>>>> >>>> >>>> umm, that is kindof the inverse of the problem.. the problem is >>>> things like gpu driver (and v4l2 drivers that import dma-buf's, >>>> afaict).. they will potentially call iommu->unmap() when device is not >>>> active (due to userspace or things beyond the control of the driver).. >>>> so *they* would want iommu to do pm get/put calls. >>> >>> Which is fine and which is actually already done by one of the patches >>> in this series, not for map/unmap, but probe, add_device, >>> remove_device. Having parts of the API doing it inside the callback >>> and other parts outside sounds at least inconsistent. >>> >>>> But other drivers >>>> trying to unmap from irq ctx would not. Which is the contradictory >>>> requirement that lead to the idea of iommu user powering up iommu for >>>> unmap. >>> >>> Sorry, maybe I wasn't clear. My last message was supposed to show that >>> it's not contradictory at all, because "other drivers trying to unmap >>> from irq ctx" would already have called pm_runtime_get_*() earlier >>> from a non-irq ctx, which would have also done the same on all the >>> linked suppliers, including the IOMMU. The ultimate result would be >>> that the map/unmap() of the IOMMU driver calling pm_runtime_get_sync() >>> would do nothing besides incrementing the reference count. >> >> The entire point was to avoid the slowpath that pm_runtime_get/put_sync() >> would add in map/unmap. It would not be correct to add a slowpath in irq_ctx >> for taking care of non-irq_ctx and for the situations where master is already >> powered-off. > > Correct me if I'm wrong, but I believe that with what I'm proposing > there wouldn't be any slow path. Yea, but only when the power domain is irq-safe? And not all platforms enable irq-safe power domains. For instance, msm doesn't enable its gdsc power domains as irq-safe. Is it something i am missing? > > a) For IRQ context, the master is already powered on and so the SMMU > is also powered on, through respective device link. > pm_runtime_get_sync() would ultimately just increment the runtime PM > usage count. > > b) For a case when the master is already powered off (which wouldn't > be IRQ context, for the reason stated in a)), powering on the SMMU is > unavoidable, if the SMMU hardware really needs to be accessed (i.e. > some TLBs need to be invalidated, if their state is preserved despite > master being powered down). > >> >>> >>>> >>>> There has already been some discussion about this on various earlier >>>> permutations of this patchset. I think we have exhausted all other >>>> options. >>> >>> I guess I should have read those. Let me do that now. >> Yea, i point to the thread in cover letter and [PATCH 1/6]. >> Thanks. > > I read through all the links in the cover letter and I could see other > attempts not working out indeed, but they were different from what I'm > proposing. > > There was also a point raised that __pm_runtime_resume() called from > pm_runtime_get_sync() would grab dev->power_lock spinlock, which is > true, except that if the device is already active, it would do it only > for the time of checking device state, so I doubt it would really be a > significant point of contention. > > Best regards, > Tomasz > -- > To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 14, 2018 at 6:13 PM, Vivek Gautam <vivek.gautam@codeaurora.org> wrote: > Hi Tomasz, > > On Wed, Feb 14, 2018 at 11:08 AM, Tomasz Figa <tfiga@chromium.org> wrote: >> On Wed, Feb 14, 2018 at 1:17 PM, Vivek Gautam >> <vivek.gautam@codeaurora.org> wrote: >>> Hi Tomasz, >>> >>> On Wed, Feb 14, 2018 at 8:31 AM, Tomasz Figa <tfiga@chromium.org> wrote: >>>> On Wed, Feb 14, 2018 at 11:13 AM, Rob Clark <robdclark@gmail.com> wrote: >>>>> On Tue, Feb 13, 2018 at 8:59 PM, Tomasz Figa <tfiga@chromium.org> wrote: >>>>>> On Wed, Feb 14, 2018 at 3:03 AM, Rob Clark <robdclark@gmail.com> wrote: >>>>>>> On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa <tfiga@chromium.org> wrote: >>>>>>>> Hi Vivek, >>>>>>>> >>>>>>>> Thanks for the patch. Please see my comments inline. >>>>>>>> >>>>>>>> On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam >>>>>>>> <vivek.gautam@codeaurora.org> wrote: >>>>>>>>> While handling the concerned iommu, there should not be a >>>>>>>>> need to power control the drm devices from iommu interface. >>>>>>>>> If these drm devices need to be powered around this time, >>>>>>>>> the respective drivers should take care of this. >>>>>>>>> >>>>>>>>> Replace the pm_runtime_get/put_sync(<drm_device>) with >>>>>>>>> pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up >>>>>>>>> the connected iommu through the device link interface. >>>>>>>>> In case the device link is not setup these get/put_suppliers() >>>>>>>>> calls will be a no-op, and the iommu driver should take care of >>>>>>>>> powering on its devices accordingly. >>>>>>>>> >>>>>>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >>>>>>>>> --- >>>>>>>>> drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- >>>>>>>>> 1 file changed, 8 insertions(+), 8 deletions(-) >>>>>>>>> >>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>> index b23d33622f37..1ab629bbee69 100644 >>>>>>>>> --- a/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>> @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, >>>>>>>>> struct msm_iommu *iommu = to_msm_iommu(mmu); >>>>>>>>> int ret; >>>>>>>>> >>>>>>>>> - pm_runtime_get_sync(mmu->dev); >>>>>>>>> + pm_runtime_get_suppliers(mmu->dev); >>>>>>>>> ret = iommu_attach_device(iommu->domain, mmu->dev); >>>>>>>>> - pm_runtime_put_sync(mmu->dev); >>>>>>>>> + pm_runtime_put_suppliers(mmu->dev); >>>>>>>> >>>>>>>> For me, it looks like a wrong place to handle runtime PM of IOMMU >>>>>>>> here. iommu_attach_device() calls into IOMMU driver's attach_device() >>>>>>>> callback and that's where necessary runtime PM gets should happen, if >>>>>>>> any. In other words, driver A (MSM DRM driver) shouldn't be dealing >>>>>>>> with power state of device controlled by driver B (ARM SMMU). >>>>>>> >>>>>>> Note that we end up having to do the same, because of iommu_unmap() >>>>>>> while DRM driver is powered off.. it might be cleaner if it was all >>>>>>> self contained in the iommu driver, but that would make it so other >>>>>>> drivers couldn't call iommu_unmap() from an irq handler, which is >>>>>>> apparently something that some of them want to do.. >>>>>> >>>>>> I'd assume that runtime PM status is already guaranteed to be active >>>>>> when the IRQ handler is running, by some other means (e.g. >>>>>> pm_runtime_get_sync() called earlier, when queuing some work to the >>>>>> hardware). Otherwise, I'm not sure how a powered down device could >>>>>> trigger an IRQ. >>>>>> >>>>>> So, if the master device power is already on, suppliers should be >>>>>> powered on as well, thanks to device links. >>>>>> >>>>> >>>>> umm, that is kindof the inverse of the problem.. the problem is >>>>> things like gpu driver (and v4l2 drivers that import dma-buf's, >>>>> afaict).. they will potentially call iommu->unmap() when device is not >>>>> active (due to userspace or things beyond the control of the driver).. >>>>> so *they* would want iommu to do pm get/put calls. >>>> >>>> Which is fine and which is actually already done by one of the patches >>>> in this series, not for map/unmap, but probe, add_device, >>>> remove_device. Having parts of the API doing it inside the callback >>>> and other parts outside sounds at least inconsistent. >>>> >>>>> But other drivers >>>>> trying to unmap from irq ctx would not. Which is the contradictory >>>>> requirement that lead to the idea of iommu user powering up iommu for >>>>> unmap. >>>> >>>> Sorry, maybe I wasn't clear. My last message was supposed to show that >>>> it's not contradictory at all, because "other drivers trying to unmap >>>> from irq ctx" would already have called pm_runtime_get_*() earlier >>>> from a non-irq ctx, which would have also done the same on all the >>>> linked suppliers, including the IOMMU. The ultimate result would be >>>> that the map/unmap() of the IOMMU driver calling pm_runtime_get_sync() >>>> would do nothing besides incrementing the reference count. >>> >>> The entire point was to avoid the slowpath that pm_runtime_get/put_sync() >>> would add in map/unmap. It would not be correct to add a slowpath in irq_ctx >>> for taking care of non-irq_ctx and for the situations where master is already >>> powered-off. >> >> Correct me if I'm wrong, but I believe that with what I'm proposing >> there wouldn't be any slow path. > > Yea, but only when the power domain is irq-safe? And not all platforms > enable irq-safe power domains. For instance, msm doesn't enable its > gdsc power domains as irq-safe. > Is it something i am missing? irq-safe would matter if there would exist a case when the call is done from IRQ context and the power is off. As I explained in a), it shouldn't happen. Best regards, Tomasz > >> >> a) For IRQ context, the master is already powered on and so the SMMU >> is also powered on, through respective device link. >> pm_runtime_get_sync() would ultimately just increment the runtime PM >> usage count. >> >> b) For a case when the master is already powered off (which wouldn't >> be IRQ context, for the reason stated in a)), powering on the SMMU is >> unavoidable, if the SMMU hardware really needs to be accessed (i.e. >> some TLBs need to be invalidated, if their state is preserved despite >> master being powered down). >> >>> >>>> >>>>> >>>>> There has already been some discussion about this on various earlier >>>>> permutations of this patchset. I think we have exhausted all other >>>>> options. >>>> >>>> I guess I should have read those. Let me do that now. >>> Yea, i point to the thread in cover letter and [PATCH 1/6]. >>> Thanks. >> >> I read through all the links in the cover letter and I could see other >> attempts not working out indeed, but they were different from what I'm >> proposing. >> >> There was also a point raised that __pm_runtime_resume() called from >> pm_runtime_get_sync() would grab dev->power_lock spinlock, which is >> true, except that if the device is already active, it would do it only >> for the time of checking device state, so I doubt it would really be a >> significant point of contention. >> >> Best regards, >> Tomasz >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member > of Code Aurora Forum, hosted by The Linux Foundation
On Wed, Feb 14, 2018 at 2:46 PM, Tomasz Figa <tfiga@chromium.org> wrote: Adding Jordan to this thread as well. > On Wed, Feb 14, 2018 at 6:13 PM, Vivek Gautam > <vivek.gautam@codeaurora.org> wrote: >> Hi Tomasz, >> >> On Wed, Feb 14, 2018 at 11:08 AM, Tomasz Figa <tfiga@chromium.org> wrote: >>> On Wed, Feb 14, 2018 at 1:17 PM, Vivek Gautam >>> <vivek.gautam@codeaurora.org> wrote: >>>> Hi Tomasz, >>>> >>>> On Wed, Feb 14, 2018 at 8:31 AM, Tomasz Figa <tfiga@chromium.org> wrote: >>>>> On Wed, Feb 14, 2018 at 11:13 AM, Rob Clark <robdclark@gmail.com> wrote: >>>>>> On Tue, Feb 13, 2018 at 8:59 PM, Tomasz Figa <tfiga@chromium.org> wrote: >>>>>>> On Wed, Feb 14, 2018 at 3:03 AM, Rob Clark <robdclark@gmail.com> wrote: >>>>>>>> On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa <tfiga@chromium.org> wrote: >>>>>>>>> Hi Vivek, >>>>>>>>> >>>>>>>>> Thanks for the patch. Please see my comments inline. >>>>>>>>> >>>>>>>>> On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam >>>>>>>>> <vivek.gautam@codeaurora.org> wrote: >>>>>>>>>> While handling the concerned iommu, there should not be a >>>>>>>>>> need to power control the drm devices from iommu interface. >>>>>>>>>> If these drm devices need to be powered around this time, >>>>>>>>>> the respective drivers should take care of this. >>>>>>>>>> >>>>>>>>>> Replace the pm_runtime_get/put_sync(<drm_device>) with >>>>>>>>>> pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up >>>>>>>>>> the connected iommu through the device link interface. >>>>>>>>>> In case the device link is not setup these get/put_suppliers() >>>>>>>>>> calls will be a no-op, and the iommu driver should take care of >>>>>>>>>> powering on its devices accordingly. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >>>>>>>>>> --- >>>>>>>>>> drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- >>>>>>>>>> 1 file changed, 8 insertions(+), 8 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>>> index b23d33622f37..1ab629bbee69 100644 >>>>>>>>>> --- a/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>>> @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, >>>>>>>>>> struct msm_iommu *iommu = to_msm_iommu(mmu); >>>>>>>>>> int ret; >>>>>>>>>> >>>>>>>>>> - pm_runtime_get_sync(mmu->dev); >>>>>>>>>> + pm_runtime_get_suppliers(mmu->dev); >>>>>>>>>> ret = iommu_attach_device(iommu->domain, mmu->dev); >>>>>>>>>> - pm_runtime_put_sync(mmu->dev); >>>>>>>>>> + pm_runtime_put_suppliers(mmu->dev); >>>>>>>>> >>>>>>>>> For me, it looks like a wrong place to handle runtime PM of IOMMU >>>>>>>>> here. iommu_attach_device() calls into IOMMU driver's attach_device() >>>>>>>>> callback and that's where necessary runtime PM gets should happen, if >>>>>>>>> any. In other words, driver A (MSM DRM driver) shouldn't be dealing >>>>>>>>> with power state of device controlled by driver B (ARM SMMU). >>>>>>>> >>>>>>>> Note that we end up having to do the same, because of iommu_unmap() >>>>>>>> while DRM driver is powered off.. it might be cleaner if it was all >>>>>>>> self contained in the iommu driver, but that would make it so other >>>>>>>> drivers couldn't call iommu_unmap() from an irq handler, which is >>>>>>>> apparently something that some of them want to do.. >>>>>>> >>>>>>> I'd assume that runtime PM status is already guaranteed to be active >>>>>>> when the IRQ handler is running, by some other means (e.g. >>>>>>> pm_runtime_get_sync() called earlier, when queuing some work to the >>>>>>> hardware). Otherwise, I'm not sure how a powered down device could >>>>>>> trigger an IRQ. >>>>>>> >>>>>>> So, if the master device power is already on, suppliers should be >>>>>>> powered on as well, thanks to device links. >>>>>>> >>>>>> >>>>>> umm, that is kindof the inverse of the problem.. the problem is >>>>>> things like gpu driver (and v4l2 drivers that import dma-buf's, >>>>>> afaict).. they will potentially call iommu->unmap() when device is not >>>>>> active (due to userspace or things beyond the control of the driver).. >>>>>> so *they* would want iommu to do pm get/put calls. >>>>> >>>>> Which is fine and which is actually already done by one of the patches >>>>> in this series, not for map/unmap, but probe, add_device, >>>>> remove_device. Having parts of the API doing it inside the callback >>>>> and other parts outside sounds at least inconsistent. >>>>> >>>>>> But other drivers >>>>>> trying to unmap from irq ctx would not. Which is the contradictory >>>>>> requirement that lead to the idea of iommu user powering up iommu for >>>>>> unmap. >>>>> >>>>> Sorry, maybe I wasn't clear. My last message was supposed to show that >>>>> it's not contradictory at all, because "other drivers trying to unmap >>>>> from irq ctx" would already have called pm_runtime_get_*() earlier >>>>> from a non-irq ctx, which would have also done the same on all the >>>>> linked suppliers, including the IOMMU. The ultimate result would be >>>>> that the map/unmap() of the IOMMU driver calling pm_runtime_get_sync() >>>>> would do nothing besides incrementing the reference count. >>>> >>>> The entire point was to avoid the slowpath that pm_runtime_get/put_sync() >>>> would add in map/unmap. It would not be correct to add a slowpath in irq_ctx >>>> for taking care of non-irq_ctx and for the situations where master is already >>>> powered-off. >>> >>> Correct me if I'm wrong, but I believe that with what I'm proposing >>> there wouldn't be any slow path. >> >> Yea, but only when the power domain is irq-safe? And not all platforms >> enable irq-safe power domains. For instance, msm doesn't enable its >> gdsc power domains as irq-safe. >> Is it something i am missing? > > irq-safe would matter if there would exist a case when the call is > done from IRQ context and the power is off. As I explained in a), it > shouldn't happen. Hi Robin, Will Does adding pm_runtime_get() in map/unmap sounds good to you? Quoting Tomasz once again here: >>> a) For IRQ context, the master is already powered on and so the SMMU >>> is also powered on, through respective device link. >>> pm_runtime_get_sync() would ultimately just increment the runtime PM >>> usage count. >>> >>> b) For a case when the master is already powered off (which wouldn't >>> be IRQ context, for the reason stated in a)), powering on the SMMU is >>> unavoidable, if the SMMU hardware really needs to be accessed (i.e. >>> some TLBs need to be invalidated, if their state is preserved despite >>> master being powered down). >>> There was also a point raised that __pm_runtime_resume() called from >>> pm_runtime_get_sync() would grab dev->power_lock spinlock, which is >>> true, except that if the device is already active, it would do it only >>> for the time of checking device state, so I doubt it would really be a >>> significant point of contention. Regards Vivek > > Best regards, > Tomasz > >> >>> >>> a) For IRQ context, the master is already powered on and so the SMMU >>> is also powered on, through respective device link. >>> pm_runtime_get_sync() would ultimately just increment the runtime PM >>> usage count. >>> >>> b) For a case when the master is already powered off (which wouldn't >>> be IRQ context, for the reason stated in a)), powering on the SMMU is >>> unavoidable, if the SMMU hardware really needs to be accessed (i.e. >>> some TLBs need to be invalidated, if their state is preserved despite >>> master being powered down). >>> >>>> >>>>> >>>>>> >>>>>> There has already been some discussion about this on various earlier >>>>>> permutations of this patchset. I think we have exhausted all other >>>>>> options. >>>>> >>>>> I guess I should have read those. Let me do that now. >>>> Yea, i point to the thread in cover letter and [PATCH 1/6]. >>>> Thanks. >>> >>> I read through all the links in the cover letter and I could see other >>> attempts not working out indeed, but they were different from what I'm >>> proposing. >>> >>> There was also a point raised that __pm_runtime_resume() called from >>> pm_runtime_get_sync() would grab dev->power_lock spinlock, which is >>> true, except that if the device is already active, it would do it only >>> for the time of checking device state, so I doubt it would really be a >>> significant point of contention. >>> >>> Best regards, >>> Tomasz >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> -- >> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member >> of Code Aurora Forum, hosted by The Linux Foundation > -- > To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Feb 14, 2018 at 12:31:29PM +0900, Tomasz Figa wrote: > Hi Jordan, > > On Wed, Feb 14, 2018 at 1:42 AM, Jordan Crouse <jcrouse@codeaurora.org> wrote: > > On Tue, Feb 13, 2018 at 06:10:38PM +0900, Tomasz Figa wrote: > >> Hi Vivek, > >> > >> Thanks for the patch. Please see my comments inline. > >> > >> On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam > >> <vivek.gautam@codeaurora.org> wrote: > >> > While handling the concerned iommu, there should not be a > >> > need to power control the drm devices from iommu interface. > >> > If these drm devices need to be powered around this time, > >> > the respective drivers should take care of this. > >> > > >> > Replace the pm_runtime_get/put_sync(<drm_device>) with > >> > pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up > >> > the connected iommu through the device link interface. > >> > In case the device link is not setup these get/put_suppliers() > >> > calls will be a no-op, and the iommu driver should take care of > >> > powering on its devices accordingly. > >> > > >> > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> > >> > --- > >> > drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- > >> > 1 file changed, 8 insertions(+), 8 deletions(-) > >> > > >> > diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c > >> > index b23d33622f37..1ab629bbee69 100644 > >> > --- a/drivers/gpu/drm/msm/msm_iommu.c > >> > +++ b/drivers/gpu/drm/msm/msm_iommu.c > >> > @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, > >> > struct msm_iommu *iommu = to_msm_iommu(mmu); > >> > int ret; > >> > > >> > - pm_runtime_get_sync(mmu->dev); > >> > + pm_runtime_get_suppliers(mmu->dev); > >> > ret = iommu_attach_device(iommu->domain, mmu->dev); > >> > - pm_runtime_put_sync(mmu->dev); > >> > + pm_runtime_put_suppliers(mmu->dev); > >> > >> For me, it looks like a wrong place to handle runtime PM of IOMMU > >> here. iommu_attach_device() calls into IOMMU driver's attach_device() > >> callback and that's where necessary runtime PM gets should happen, if > >> any. In other words, driver A (MSM DRM driver) shouldn't be dealing > >> with power state of device controlled by driver B (ARM SMMU). > > > > This whole thing is confused by the fact that on MSM the GPU and the GPU IOMMU > > share some of the same clocks and power rail so turning on the GPU also > > turned on the IOMMU register banks by extension. > > This is surprisingly not a very surprising case. Exactly the same can > be seen on Rockchip SoCs and we're solving the problem using the > solution I suggested. In fact, my suggestions to this thread are based > on the design we chose for Rockchip, due to the high level of > similarity (+/- the GPU directly programming IOMMU registers, which is > not present there, but AFAICT it doesn't pose a problem here). > > > > > But if we put that aside the question is who should be responsible for > > controlling the power in this relationship and there are several good reasons to > > leave it up to the client device. The most important reason is when we move to > > the per-instance model where the GPU self-programmings the SMMU registers. In > > that case, the driver will need to make sure that the SMMU is powered up before > > submitting the command and then removing the power vote when the commands > > are done to save energy. > > I might need more insight on what's going on in your hardware, but > with my current understanding I'd argue that that is not right, > because: > > - When submitting commands to the GPU, the GPU driver will > pm_runtime_get_sync() on the GPU device, which will automatically do > the same on all the linked suppliers, which would also include the > SMMU itself. The role of device links here is exactly that the GPU > driver doesn't have to care which other devices need to be brought up. This is true. Assuming that the device link works correctly we would not need to explicitly power the SMMU which makes my point entirely moot. > - When the GPU is operating, the SMMU power must be supplied anyway, > because it needs to be doing the translations, right? Note that by > "power" I really mean the physical power supply in the SoC, e.g. as > for a power domain. The runtime PM API in its current form (e.g. > binary off or on operation) is unsuitable for managing other things, > such as clocks (and there is ongoing work on improving it, e.g. by > adding support for multiple power states). As others have pointed out, the register banks and the translation unit are powered separately (or at least, clocked separately). > ^^ The above would be actually guaranteed by your hardware design, > where SMMU and GPU share the power domain and clocks. (We used to rely > on this in old downstream implementation of Rockchip IOMMU and master > drivers in Chromium OS kernel, before we moved to handling the clocks > explicitly in the IOMMU driver and properly using device links to > manage the power domain and state restoration.) I wouldn't call it a guarantee. I would instead say that it works by a happy coincidence that I don't think we should depend on. > > > > Additionally, there might be legitimate reasons in the driver to batch > > operations - you may wish to attach the device and then map several global > > buffers immediately - having driver side control prevents several unneeded power > > transitions. > > As I mentioned before, these operations wouldn't normally need any > power transitions, since mapping with the TLB powered down boils down > to just updating the page tables in memory. However, as Robin > mentioned before, there might be some hardware factors, such as TLB > being powered separately (or retaining contents in some other way), > where this wouldn't be ensured indeed. > > Still, that's where runtime PM autosuspend feature (i.e. delayed > suspend) comes to the rescue, with the advantage of handling the cases > when the master driver receives map/unmap requests not batched (but > maybe a slight drawback in terms of the suspend not happening > instantly and losing some power, but it's about power domains, so > mainly leakage current, isn't it?) > > > > Perhaps the right answer is to do both - allow for the driver to enable the > > supplier but also do the right power operations at the appropriately places in > > the IOMMU driver. > > > >> This is also important for the reasons I stated in my comments to > >> "[PATCH v7 1/6] base: power: runtime: Export > >> pm_runtime_get/put_suppliers". Quoting for everyone's convenience: > >> > >> >> There are however cases in which the consumer wants to power-on > >> >> the supplier, but not itself. > >> >> E.g., A Graphics or multimedia driver wants to power-on the SMMU > >> >> to unmap a buffer and finish the TLB operations without powering > >> >> on itself. > >> > > >> >This sounds strange to me. If the SMMU is powered down, wouldn't the > >> >TLB lose its contents as well (and so no flushing needed)? > >> > > > > >> >Other than that, what kind of hardware operations would be needed > >> >besides just updating the page tables from the CPU? > >> > > > > >> In other words, the SMMU driver can deal with hardware state based on > >> return value of pm_runtime_get_sync() or pm_runtime_get_if_in_use() > >> and decide whether some operations are necessary or not, e.g. > >> - a state restore is necessary if the domain was powered off, but we > >> are bringing the master on, > >> - a flush may not be required when (un)mapping with the domain powered off, > >> - etc. > > > > I agree that there is probably some advanced logic that we can do to > > conclusively figure out the state of the hardware and improve the behavior. > > I would love to see the SMMU driver get smarter but for the moment we can't > > trust it and so we need to force the behavior from the GPU driver. The current > > code works for a5x and earlier but on sdm845 we can (no longer) treat the GPU > > and the SMMU as the same device for power purposes so we need this code. > > Hmm, you've lost me there. Above you mention that "on MSM the GPU and > the GPU IOMMU share some of the same clocks and power rail". Is this > no longer the case for sdm845? If so, would you mind shedding a bit > more light on how this looks there? Sure. I've sent out the code, but it can be confusing, so I'll try to explain it a little better. On a5xx and earlier the GPU power/clocks were directly controlled by the CPU so the pm resume consisted of a handful of clock controls plus the domain(s) controlled by genpd. Starting on sdm845 we have added a new integrated microcontroller called the GMU which takes over power control for the GPU. The GMU runs in real time and it can bring the GPU power up and down very quickly - even quickly enough to collapse between frames. If done right this can save significant leakage. The problem is of course that the GMU is a fully featured processor in its own right so its not longer a matter of just turning on clocks and rails. We need to boot it, load the microcode, establish IPC and so on. As you imagine,the GMU also uses the SMMU to share code with the CPU. The kicker is that the while SMMU and GPU share common clocks, the GMU does not and since from the perspective of the CPU the only device that we control is the GMU and we have to treat the SMMU as a truly separate device and thats how we get to where we are. But as you said, as long as we have the device link correctly set up, I think we might just be able to get away with depending on the supplier chain working during pm resume. I'll test it out today and see how it goes. Thanks, Jordan
On 14/02/18 10:33, Vivek Gautam wrote: > On Wed, Feb 14, 2018 at 2:46 PM, Tomasz Figa <tfiga@chromium.org> wrote: > > Adding Jordan to this thread as well. > >> On Wed, Feb 14, 2018 at 6:13 PM, Vivek Gautam >> <vivek.gautam@codeaurora.org> wrote: >>> Hi Tomasz, >>> >>> On Wed, Feb 14, 2018 at 11:08 AM, Tomasz Figa <tfiga@chromium.org> wrote: >>>> On Wed, Feb 14, 2018 at 1:17 PM, Vivek Gautam >>>> <vivek.gautam@codeaurora.org> wrote: >>>>> Hi Tomasz, >>>>> >>>>> On Wed, Feb 14, 2018 at 8:31 AM, Tomasz Figa <tfiga@chromium.org> wrote: >>>>>> On Wed, Feb 14, 2018 at 11:13 AM, Rob Clark <robdclark@gmail.com> wrote: >>>>>>> On Tue, Feb 13, 2018 at 8:59 PM, Tomasz Figa <tfiga@chromium.org> wrote: >>>>>>>> On Wed, Feb 14, 2018 at 3:03 AM, Rob Clark <robdclark@gmail.com> wrote: >>>>>>>>> On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa <tfiga@chromium.org> wrote: >>>>>>>>>> Hi Vivek, >>>>>>>>>> >>>>>>>>>> Thanks for the patch. Please see my comments inline. >>>>>>>>>> >>>>>>>>>> On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam >>>>>>>>>> <vivek.gautam@codeaurora.org> wrote: >>>>>>>>>>> While handling the concerned iommu, there should not be a >>>>>>>>>>> need to power control the drm devices from iommu interface. >>>>>>>>>>> If these drm devices need to be powered around this time, >>>>>>>>>>> the respective drivers should take care of this. >>>>>>>>>>> >>>>>>>>>>> Replace the pm_runtime_get/put_sync(<drm_device>) with >>>>>>>>>>> pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up >>>>>>>>>>> the connected iommu through the device link interface. >>>>>>>>>>> In case the device link is not setup these get/put_suppliers() >>>>>>>>>>> calls will be a no-op, and the iommu driver should take care of >>>>>>>>>>> powering on its devices accordingly. >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >>>>>>>>>>> --- >>>>>>>>>>> drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- >>>>>>>>>>> 1 file changed, 8 insertions(+), 8 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>>>> index b23d33622f37..1ab629bbee69 100644 >>>>>>>>>>> --- a/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>>>> @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, >>>>>>>>>>> struct msm_iommu *iommu = to_msm_iommu(mmu); >>>>>>>>>>> int ret; >>>>>>>>>>> >>>>>>>>>>> - pm_runtime_get_sync(mmu->dev); >>>>>>>>>>> + pm_runtime_get_suppliers(mmu->dev); >>>>>>>>>>> ret = iommu_attach_device(iommu->domain, mmu->dev); >>>>>>>>>>> - pm_runtime_put_sync(mmu->dev); >>>>>>>>>>> + pm_runtime_put_suppliers(mmu->dev); >>>>>>>>>> >>>>>>>>>> For me, it looks like a wrong place to handle runtime PM of IOMMU >>>>>>>>>> here. iommu_attach_device() calls into IOMMU driver's attach_device() >>>>>>>>>> callback and that's where necessary runtime PM gets should happen, if >>>>>>>>>> any. In other words, driver A (MSM DRM driver) shouldn't be dealing >>>>>>>>>> with power state of device controlled by driver B (ARM SMMU). >>>>>>>>> >>>>>>>>> Note that we end up having to do the same, because of iommu_unmap() >>>>>>>>> while DRM driver is powered off.. it might be cleaner if it was all >>>>>>>>> self contained in the iommu driver, but that would make it so other >>>>>>>>> drivers couldn't call iommu_unmap() from an irq handler, which is >>>>>>>>> apparently something that some of them want to do.. >>>>>>>> >>>>>>>> I'd assume that runtime PM status is already guaranteed to be active >>>>>>>> when the IRQ handler is running, by some other means (e.g. >>>>>>>> pm_runtime_get_sync() called earlier, when queuing some work to the >>>>>>>> hardware). Otherwise, I'm not sure how a powered down device could >>>>>>>> trigger an IRQ. >>>>>>>> >>>>>>>> So, if the master device power is already on, suppliers should be >>>>>>>> powered on as well, thanks to device links. >>>>>>>> >>>>>>> >>>>>>> umm, that is kindof the inverse of the problem.. the problem is >>>>>>> things like gpu driver (and v4l2 drivers that import dma-buf's, >>>>>>> afaict).. they will potentially call iommu->unmap() when device is not >>>>>>> active (due to userspace or things beyond the control of the driver).. >>>>>>> so *they* would want iommu to do pm get/put calls. >>>>>> >>>>>> Which is fine and which is actually already done by one of the patches >>>>>> in this series, not for map/unmap, but probe, add_device, >>>>>> remove_device. Having parts of the API doing it inside the callback >>>>>> and other parts outside sounds at least inconsistent. >>>>>> >>>>>>> But other drivers >>>>>>> trying to unmap from irq ctx would not. Which is the contradictory >>>>>>> requirement that lead to the idea of iommu user powering up iommu for >>>>>>> unmap. >>>>>> >>>>>> Sorry, maybe I wasn't clear. My last message was supposed to show that >>>>>> it's not contradictory at all, because "other drivers trying to unmap >>>>>> from irq ctx" would already have called pm_runtime_get_*() earlier >>>>>> from a non-irq ctx, which would have also done the same on all the >>>>>> linked suppliers, including the IOMMU. The ultimate result would be >>>>>> that the map/unmap() of the IOMMU driver calling pm_runtime_get_sync() >>>>>> would do nothing besides incrementing the reference count. >>>>> >>>>> The entire point was to avoid the slowpath that pm_runtime_get/put_sync() >>>>> would add in map/unmap. It would not be correct to add a slowpath in irq_ctx >>>>> for taking care of non-irq_ctx and for the situations where master is already >>>>> powered-off. >>>> >>>> Correct me if I'm wrong, but I believe that with what I'm proposing >>>> there wouldn't be any slow path. >>> >>> Yea, but only when the power domain is irq-safe? And not all platforms >>> enable irq-safe power domains. For instance, msm doesn't enable its >>> gdsc power domains as irq-safe. >>> Is it something i am missing? >> >> irq-safe would matter if there would exist a case when the call is >> done from IRQ context and the power is off. As I explained in a), it >> shouldn't happen. > > Hi Robin, Will > > Does adding pm_runtime_get() in map/unmap sounds good to you? Given that we spent significant effort last year removing as much locking as we possibly could from the map/unmap path to minimise the significant performance impact it was having on networking/storage/etc. workloads, I really don't want to introduce more for the sake of one specific use-case, so no. Robin.
On Wed, Feb 14, 2018 at 10:48 AM, Jordan Crouse <jcrouse@codeaurora.org> wrote: > On Wed, Feb 14, 2018 at 12:31:29PM +0900, Tomasz Figa wrote: >> >> - When submitting commands to the GPU, the GPU driver will >> pm_runtime_get_sync() on the GPU device, which will automatically do >> the same on all the linked suppliers, which would also include the >> SMMU itself. The role of device links here is exactly that the GPU >> driver doesn't have to care which other devices need to be brought up. > > This is true. Assuming that the device link works correctly we would not need > to explicitly power the SMMU which makes my point entirely moot. Just to point out what motivated this patchset, the biggest problem is iommu_unmap() because that can happen when GPU is not powered on (or in the v4l2 case, because some other device dropped it's reference to the dma-buf allowing it to be free'd). Currently we pm get/put the GPU device around unmap, but it is kinda silly to boot up the GPU just to unmap a buffer. (Semi-related, I would also like to batch map/unmap's, I just haven't gotten around to implementing it yet.. but that would be another case where a single get_supplier()/put_supplier() outside of the iommu would make sense instead of pm_get/put() inside the iommu driver's ->unmap().) If you really dislike the get/put_supplier() approach, then perhaps we need iommu_pm_get()/iommu_pm_put() operations that the iommu user could use to accomplish the same thing? BR, -R
On Thu, Feb 15, 2018 at 1:03 AM, Robin Murphy <robin.murphy@arm.com> wrote: > On 14/02/18 10:33, Vivek Gautam wrote: >> >> On Wed, Feb 14, 2018 at 2:46 PM, Tomasz Figa <tfiga@chromium.org> wrote: >> >> Adding Jordan to this thread as well. >> >>> On Wed, Feb 14, 2018 at 6:13 PM, Vivek Gautam >>> <vivek.gautam@codeaurora.org> wrote: >>>> >>>> Hi Tomasz, >>>> >>>> On Wed, Feb 14, 2018 at 11:08 AM, Tomasz Figa <tfiga@chromium.org> >>>> wrote: >>>>> >>>>> On Wed, Feb 14, 2018 at 1:17 PM, Vivek Gautam >>>>> <vivek.gautam@codeaurora.org> wrote: >>>>>> >>>>>> Hi Tomasz, >>>>>> >>>>>> On Wed, Feb 14, 2018 at 8:31 AM, Tomasz Figa <tfiga@chromium.org> >>>>>> wrote: >>>>>>> >>>>>>> On Wed, Feb 14, 2018 at 11:13 AM, Rob Clark <robdclark@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> On Tue, Feb 13, 2018 at 8:59 PM, Tomasz Figa <tfiga@chromium.org> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Wed, Feb 14, 2018 at 3:03 AM, Rob Clark <robdclark@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa <tfiga@chromium.org> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Vivek, >>>>>>>>>>> >>>>>>>>>>> Thanks for the patch. Please see my comments inline. >>>>>>>>>>> >>>>>>>>>>> On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam >>>>>>>>>>> <vivek.gautam@codeaurora.org> wrote: >>>>>>>>>>>> >>>>>>>>>>>> While handling the concerned iommu, there should not be a >>>>>>>>>>>> need to power control the drm devices from iommu interface. >>>>>>>>>>>> If these drm devices need to be powered around this time, >>>>>>>>>>>> the respective drivers should take care of this. >>>>>>>>>>>> >>>>>>>>>>>> Replace the pm_runtime_get/put_sync(<drm_device>) with >>>>>>>>>>>> pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up >>>>>>>>>>>> the connected iommu through the device link interface. >>>>>>>>>>>> In case the device link is not setup these get/put_suppliers() >>>>>>>>>>>> calls will be a no-op, and the iommu driver should take care of >>>>>>>>>>>> powering on its devices accordingly. >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >>>>>>>>>>>> --- >>>>>>>>>>>> drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- >>>>>>>>>>>> 1 file changed, 8 insertions(+), 8 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>>>>> b/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>>>>> index b23d33622f37..1ab629bbee69 100644 >>>>>>>>>>>> --- a/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>>>>> @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu >>>>>>>>>>>> *mmu, const char * const *names, >>>>>>>>>>>> struct msm_iommu *iommu = to_msm_iommu(mmu); >>>>>>>>>>>> int ret; >>>>>>>>>>>> >>>>>>>>>>>> - pm_runtime_get_sync(mmu->dev); >>>>>>>>>>>> + pm_runtime_get_suppliers(mmu->dev); >>>>>>>>>>>> ret = iommu_attach_device(iommu->domain, mmu->dev); >>>>>>>>>>>> - pm_runtime_put_sync(mmu->dev); >>>>>>>>>>>> + pm_runtime_put_suppliers(mmu->dev); >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> For me, it looks like a wrong place to handle runtime PM of IOMMU >>>>>>>>>>> here. iommu_attach_device() calls into IOMMU driver's >>>>>>>>>>> attach_device() >>>>>>>>>>> callback and that's where necessary runtime PM gets should >>>>>>>>>>> happen, if >>>>>>>>>>> any. In other words, driver A (MSM DRM driver) shouldn't be >>>>>>>>>>> dealing >>>>>>>>>>> with power state of device controlled by driver B (ARM SMMU). >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Note that we end up having to do the same, because of >>>>>>>>>> iommu_unmap() >>>>>>>>>> while DRM driver is powered off.. it might be cleaner if it was >>>>>>>>>> all >>>>>>>>>> self contained in the iommu driver, but that would make it so >>>>>>>>>> other >>>>>>>>>> drivers couldn't call iommu_unmap() from an irq handler, which is >>>>>>>>>> apparently something that some of them want to do.. >>>>>>>>> >>>>>>>>> >>>>>>>>> I'd assume that runtime PM status is already guaranteed to be >>>>>>>>> active >>>>>>>>> when the IRQ handler is running, by some other means (e.g. >>>>>>>>> pm_runtime_get_sync() called earlier, when queuing some work to the >>>>>>>>> hardware). Otherwise, I'm not sure how a powered down device could >>>>>>>>> trigger an IRQ. >>>>>>>>> >>>>>>>>> So, if the master device power is already on, suppliers should be >>>>>>>>> powered on as well, thanks to device links. >>>>>>>>> >>>>>>>> >>>>>>>> umm, that is kindof the inverse of the problem.. the problem is >>>>>>>> things like gpu driver (and v4l2 drivers that import dma-buf's, >>>>>>>> afaict).. they will potentially call iommu->unmap() when device is >>>>>>>> not >>>>>>>> active (due to userspace or things beyond the control of the >>>>>>>> driver).. >>>>>>>> so *they* would want iommu to do pm get/put calls. >>>>>>> >>>>>>> >>>>>>> Which is fine and which is actually already done by one of the >>>>>>> patches >>>>>>> in this series, not for map/unmap, but probe, add_device, >>>>>>> remove_device. Having parts of the API doing it inside the callback >>>>>>> and other parts outside sounds at least inconsistent. >>>>>>> >>>>>>>> But other drivers >>>>>>>> trying to unmap from irq ctx would not. Which is the contradictory >>>>>>>> requirement that lead to the idea of iommu user powering up iommu >>>>>>>> for >>>>>>>> unmap. >>>>>>> >>>>>>> >>>>>>> Sorry, maybe I wasn't clear. My last message was supposed to show >>>>>>> that >>>>>>> it's not contradictory at all, because "other drivers trying to unmap >>>>>>> from irq ctx" would already have called pm_runtime_get_*() earlier >>>>>>> from a non-irq ctx, which would have also done the same on all the >>>>>>> linked suppliers, including the IOMMU. The ultimate result would be >>>>>>> that the map/unmap() of the IOMMU driver calling >>>>>>> pm_runtime_get_sync() >>>>>>> would do nothing besides incrementing the reference count. >>>>>> >>>>>> >>>>>> The entire point was to avoid the slowpath that >>>>>> pm_runtime_get/put_sync() >>>>>> would add in map/unmap. It would not be correct to add a slowpath in >>>>>> irq_ctx >>>>>> for taking care of non-irq_ctx and for the situations where master is >>>>>> already >>>>>> powered-off. >>>>> >>>>> >>>>> Correct me if I'm wrong, but I believe that with what I'm proposing >>>>> there wouldn't be any slow path. >>>> >>>> >>>> Yea, but only when the power domain is irq-safe? And not all platforms >>>> enable irq-safe power domains. For instance, msm doesn't enable its >>>> gdsc power domains as irq-safe. >>>> Is it something i am missing? >>> >>> >>> irq-safe would matter if there would exist a case when the call is >>> done from IRQ context and the power is off. As I explained in a), it >>> shouldn't happen. >> >> >> Hi Robin, Will >> >> Does adding pm_runtime_get() in map/unmap sounds good to you? > > > Given that we spent significant effort last year removing as much locking as > we possibly could from the map/unmap path to minimise the significant > performance impact it was having on networking/storage/etc. workloads, I > really don't want to introduce more for the sake of one specific use-case, > so no. Could you elaborate on what kind of locking you are concerned about? As I explained before, the normally happening fast path would lock dev->power_lock only for the brief moment of incrementing the runtime PM usage counter. Best regards, Tomasz
On Thu, Feb 15, 2018 at 1:12 AM, Rob Clark <robdclark@gmail.com> wrote: > On Wed, Feb 14, 2018 at 10:48 AM, Jordan Crouse <jcrouse@codeaurora.org> wrote: >> On Wed, Feb 14, 2018 at 12:31:29PM +0900, Tomasz Figa wrote: >>> >>> - When submitting commands to the GPU, the GPU driver will >>> pm_runtime_get_sync() on the GPU device, which will automatically do >>> the same on all the linked suppliers, which would also include the >>> SMMU itself. The role of device links here is exactly that the GPU >>> driver doesn't have to care which other devices need to be brought up. >> >> This is true. Assuming that the device link works correctly we would not need >> to explicitly power the SMMU which makes my point entirely moot. > > Just to point out what motivated this patchset, the biggest problem is > iommu_unmap() because that can happen when GPU is not powered on (or > in the v4l2 case, because some other device dropped it's reference to > the dma-buf allowing it to be free'd). Currently we pm get/put the > GPU device around unmap, but it is kinda silly to boot up the GPU just > to unmap a buffer. Note that in V4L2 both mapping and unmapping can happen completely without involving the driver. So AFAICT the approach being implemented by this patchset will not work, because there will be no one to power up the IOMMU before the operation. Moreover, there are platforms for which there is no reason to power up the IOMMU just for map/unmap, because the hardware state is lost anyway and the only real work needed is updating the page tables in memory. (I feel like this is actually true for most of the platforms in the wild, but this is based purely on the not so small number of platforms I worked with, haven't bothered looking for more general evidence.) > > (Semi-related, I would also like to batch map/unmap's, I just haven't > gotten around to implementing it yet.. but that would be another case > where a single get_supplier()/put_supplier() outside of the iommu > would make sense instead of pm_get/put() inside the iommu driver's > ->unmap().) > > If you really dislike the get/put_supplier() approach, then perhaps we > need iommu_pm_get()/iommu_pm_put() operations that the iommu user > could use to accomplish the same thing? I'm afraid this wouldn't work for V4L2 either. And I still haven't been given any evidence that the approach I'm suggesting, which relies only on existing pieces of infrastructure and which worked for both Exynos and Rockchip, including V4L2, wouldn't work for SMMU and/or QC SoCs. Best regards, Tomasz
On Thu, Feb 15, 2018 at 12:17 PM, Tomasz Figa <tfiga@chromium.org> wrote: > On Thu, Feb 15, 2018 at 1:03 AM, Robin Murphy <robin.murphy@arm.com> wrote: >> On 14/02/18 10:33, Vivek Gautam wrote: >>> >>> On Wed, Feb 14, 2018 at 2:46 PM, Tomasz Figa <tfiga@chromium.org> wrote: >>> >>> Adding Jordan to this thread as well. >>> >>>> On Wed, Feb 14, 2018 at 6:13 PM, Vivek Gautam >>>> <vivek.gautam@codeaurora.org> wrote: >>>>> >>>>> Hi Tomasz, >>>>> >>>>> On Wed, Feb 14, 2018 at 11:08 AM, Tomasz Figa <tfiga@chromium.org> >>>>> wrote: >>>>>> >>>>>> On Wed, Feb 14, 2018 at 1:17 PM, Vivek Gautam >>>>>> <vivek.gautam@codeaurora.org> wrote: >>>>>>> >>>>>>> Hi Tomasz, >>>>>>> >>>>>>> On Wed, Feb 14, 2018 at 8:31 AM, Tomasz Figa <tfiga@chromium.org> >>>>>>> wrote: >>>>>>>> >>>>>>>> On Wed, Feb 14, 2018 at 11:13 AM, Rob Clark <robdclark@gmail.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> On Tue, Feb 13, 2018 at 8:59 PM, Tomasz Figa <tfiga@chromium.org> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> On Wed, Feb 14, 2018 at 3:03 AM, Rob Clark <robdclark@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> On Tue, Feb 13, 2018 at 4:10 AM, Tomasz Figa <tfiga@chromium.org> >>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Vivek, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the patch. Please see my comments inline. >>>>>>>>>>>> >>>>>>>>>>>> On Wed, Feb 7, 2018 at 7:31 PM, Vivek Gautam >>>>>>>>>>>> <vivek.gautam@codeaurora.org> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> While handling the concerned iommu, there should not be a >>>>>>>>>>>>> need to power control the drm devices from iommu interface. >>>>>>>>>>>>> If these drm devices need to be powered around this time, >>>>>>>>>>>>> the respective drivers should take care of this. >>>>>>>>>>>>> >>>>>>>>>>>>> Replace the pm_runtime_get/put_sync(<drm_device>) with >>>>>>>>>>>>> pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up >>>>>>>>>>>>> the connected iommu through the device link interface. >>>>>>>>>>>>> In case the device link is not setup these get/put_suppliers() >>>>>>>>>>>>> calls will be a no-op, and the iommu driver should take care of >>>>>>>>>>>>> powering on its devices accordingly. >>>>>>>>>>>>> >>>>>>>>>>>>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> >>>>>>>>>>>>> --- >>>>>>>>>>>>> drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- >>>>>>>>>>>>> 1 file changed, 8 insertions(+), 8 deletions(-) >>>>>>>>>>>>> >>>>>>>>>>>>> diff --git a/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>>>>>> b/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>>>>>> index b23d33622f37..1ab629bbee69 100644 >>>>>>>>>>>>> --- a/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>>>>>> +++ b/drivers/gpu/drm/msm/msm_iommu.c >>>>>>>>>>>>> @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu >>>>>>>>>>>>> *mmu, const char * const *names, >>>>>>>>>>>>> struct msm_iommu *iommu = to_msm_iommu(mmu); >>>>>>>>>>>>> int ret; >>>>>>>>>>>>> >>>>>>>>>>>>> - pm_runtime_get_sync(mmu->dev); >>>>>>>>>>>>> + pm_runtime_get_suppliers(mmu->dev); >>>>>>>>>>>>> ret = iommu_attach_device(iommu->domain, mmu->dev); >>>>>>>>>>>>> - pm_runtime_put_sync(mmu->dev); >>>>>>>>>>>>> + pm_runtime_put_suppliers(mmu->dev); >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> For me, it looks like a wrong place to handle runtime PM of IOMMU >>>>>>>>>>>> here. iommu_attach_device() calls into IOMMU driver's >>>>>>>>>>>> attach_device() >>>>>>>>>>>> callback and that's where necessary runtime PM gets should >>>>>>>>>>>> happen, if >>>>>>>>>>>> any. In other words, driver A (MSM DRM driver) shouldn't be >>>>>>>>>>>> dealing >>>>>>>>>>>> with power state of device controlled by driver B (ARM SMMU). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Note that we end up having to do the same, because of >>>>>>>>>>> iommu_unmap() >>>>>>>>>>> while DRM driver is powered off.. it might be cleaner if it was >>>>>>>>>>> all >>>>>>>>>>> self contained in the iommu driver, but that would make it so >>>>>>>>>>> other >>>>>>>>>>> drivers couldn't call iommu_unmap() from an irq handler, which is >>>>>>>>>>> apparently something that some of them want to do.. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I'd assume that runtime PM status is already guaranteed to be >>>>>>>>>> active >>>>>>>>>> when the IRQ handler is running, by some other means (e.g. >>>>>>>>>> pm_runtime_get_sync() called earlier, when queuing some work to the >>>>>>>>>> hardware). Otherwise, I'm not sure how a powered down device could >>>>>>>>>> trigger an IRQ. >>>>>>>>>> >>>>>>>>>> So, if the master device power is already on, suppliers should be >>>>>>>>>> powered on as well, thanks to device links. >>>>>>>>>> >>>>>>>>> >>>>>>>>> umm, that is kindof the inverse of the problem.. the problem is >>>>>>>>> things like gpu driver (and v4l2 drivers that import dma-buf's, >>>>>>>>> afaict).. they will potentially call iommu->unmap() when device is >>>>>>>>> not >>>>>>>>> active (due to userspace or things beyond the control of the >>>>>>>>> driver).. >>>>>>>>> so *they* would want iommu to do pm get/put calls. >>>>>>>> >>>>>>>> >>>>>>>> Which is fine and which is actually already done by one of the >>>>>>>> patches >>>>>>>> in this series, not for map/unmap, but probe, add_device, >>>>>>>> remove_device. Having parts of the API doing it inside the callback >>>>>>>> and other parts outside sounds at least inconsistent. >>>>>>>> >>>>>>>>> But other drivers >>>>>>>>> trying to unmap from irq ctx would not. Which is the contradictory >>>>>>>>> requirement that lead to the idea of iommu user powering up iommu >>>>>>>>> for >>>>>>>>> unmap. >>>>>>>> >>>>>>>> >>>>>>>> Sorry, maybe I wasn't clear. My last message was supposed to show >>>>>>>> that >>>>>>>> it's not contradictory at all, because "other drivers trying to unmap >>>>>>>> from irq ctx" would already have called pm_runtime_get_*() earlier >>>>>>>> from a non-irq ctx, which would have also done the same on all the >>>>>>>> linked suppliers, including the IOMMU. The ultimate result would be >>>>>>>> that the map/unmap() of the IOMMU driver calling >>>>>>>> pm_runtime_get_sync() >>>>>>>> would do nothing besides incrementing the reference count. >>>>>>> >>>>>>> >>>>>>> The entire point was to avoid the slowpath that >>>>>>> pm_runtime_get/put_sync() >>>>>>> would add in map/unmap. It would not be correct to add a slowpath in >>>>>>> irq_ctx >>>>>>> for taking care of non-irq_ctx and for the situations where master is >>>>>>> already >>>>>>> powered-off. >>>>>> >>>>>> >>>>>> Correct me if I'm wrong, but I believe that with what I'm proposing >>>>>> there wouldn't be any slow path. >>>>> >>>>> >>>>> Yea, but only when the power domain is irq-safe? And not all platforms >>>>> enable irq-safe power domains. For instance, msm doesn't enable its >>>>> gdsc power domains as irq-safe. >>>>> Is it something i am missing? >>>> >>>> >>>> irq-safe would matter if there would exist a case when the call is >>>> done from IRQ context and the power is off. As I explained in a), it >>>> shouldn't happen. >>> >>> >>> Hi Robin, Will >>> >>> Does adding pm_runtime_get() in map/unmap sounds good to you? >> >> >> Given that we spent significant effort last year removing as much locking as >> we possibly could from the map/unmap path to minimise the significant >> performance impact it was having on networking/storage/etc. workloads, I >> really don't want to introduce more for the sake of one specific use-case, >> so no. > > Could you elaborate on what kind of locking you are concerned about? > As I explained before, the normally happening fast path would lock > dev->power_lock only for the brief moment of incrementing the runtime > PM usage counter. My bad, that's not even it. The atomic usage counter is incremented beforehands, without any locking [1] and the spinlock is acquired only for the sake of validating that device's runtime PM state remained valid indeed [2], which would be the case in the fast path of the same driver doing two mappings in parallel, with the master powered on (and so the SMMU, through device links; if master was not powered on already, powering on the SMMU is unavoidable anyway and it would add much more latency than the spinlock itself). [1] http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L1028 [2] http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L613 In any case, I can't imagine this working with V4L2 or anything else relying on any memory management more generic than calling IOMMU API directly from the driver, with the IOMMU device having runtime PM enabled, but without managing the runtime PM from the IOMMU driver's callbacks that need access to the hardware. As I mentioned before, only the IOMMU driver knows when exactly the real hardware access needs to be done (e.g. Rockchip/Exynos don't need to do that for map/unmap if the power is down, but some implementations of SMMU with TLB powered separately might need to do so). Best regards, Tomasz
On Wed, Feb 14, 2018 at 11:09 PM, Tomasz Figa <tfiga@chromium.org> wrote: > On Thu, Feb 15, 2018 at 1:12 AM, Rob Clark <robdclark@gmail.com> wrote: >> On Wed, Feb 14, 2018 at 10:48 AM, Jordan Crouse <jcrouse@codeaurora.org> wrote: >>> On Wed, Feb 14, 2018 at 12:31:29PM +0900, Tomasz Figa wrote: >>>> >>>> - When submitting commands to the GPU, the GPU driver will >>>> pm_runtime_get_sync() on the GPU device, which will automatically do >>>> the same on all the linked suppliers, which would also include the >>>> SMMU itself. The role of device links here is exactly that the GPU >>>> driver doesn't have to care which other devices need to be brought up. >>> >>> This is true. Assuming that the device link works correctly we would not need >>> to explicitly power the SMMU which makes my point entirely moot. >> >> Just to point out what motivated this patchset, the biggest problem is >> iommu_unmap() because that can happen when GPU is not powered on (or >> in the v4l2 case, because some other device dropped it's reference to >> the dma-buf allowing it to be free'd). Currently we pm get/put the >> GPU device around unmap, but it is kinda silly to boot up the GPU just >> to unmap a buffer. > > Note that in V4L2 both mapping and unmapping can happen completely > without involving the driver. So AFAICT the approach being implemented > by this patchset will not work, because there will be no one to power > up the IOMMU before the operation. Moreover, there are platforms for > which there is no reason to power up the IOMMU just for map/unmap, > because the hardware state is lost anyway and the only real work > needed is updating the page tables in memory. (I feel like this is > actually true for most of the platforms in the wild, but this is based > purely on the not so small number of platforms I worked with, haven't > bothered looking for more general evidence.) > At least as far as drm/msm/adreno, I'm not terribly concerned about other platforms that don't need to power up iommu. It's not really the same situation as a IP block that shows up in all different vendor's SoCs. But if you can convince Robin to go for get/put_sync() calls inside the iommu driver, I'm fine with that approach too. That is what I do in qcom_iommu already. But if not I'd like to at least solve this for some platforms if we can't solve for all. BR, -R >> >> (Semi-related, I would also like to batch map/unmap's, I just haven't >> gotten around to implementing it yet.. but that would be another case >> where a single get_supplier()/put_supplier() outside of the iommu >> would make sense instead of pm_get/put() inside the iommu driver's >> ->unmap().) >> >> If you really dislike the get/put_supplier() approach, then perhaps we >> need iommu_pm_get()/iommu_pm_put() operations that the iommu user >> could use to accomplish the same thing? > > I'm afraid this wouldn't work for V4L2 either. And I still haven't > been given any evidence that the approach I'm suggesting, which relies > only on existing pieces of infrastructure and which worked for both > Exynos and Rockchip, including V4L2, wouldn't work for SMMU and/or QC > SoCs. > > Best regards, > Tomasz
On 15/02/18 04:17, Tomasz Figa wrote: [...] >> Could you elaborate on what kind of locking you are concerned about? >> As I explained before, the normally happening fast path would lock >> dev->power_lock only for the brief moment of incrementing the runtime >> PM usage counter. > > My bad, that's not even it. > > The atomic usage counter is incremented beforehands, without any > locking [1] and the spinlock is acquired only for the sake of > validating that device's runtime PM state remained valid indeed [2], > which would be the case in the fast path of the same driver doing two > mappings in parallel, with the master powered on (and so the SMMU, > through device links; if master was not powered on already, powering > on the SMMU is unavoidable anyway and it would add much more latency > than the spinlock itself). We now have no locking at all in the map path, and only a per-domain lock around TLB sync in unmap which is unfortunately necessary for correctness; the latter isn't too terrible, since in "serious" hardware it should only be serialising a few cpus serving the same device against each other (e.g. for multiple queues on a single NIC). Putting in a global lock which serialises *all* concurrent map and unmap calls for *all* unrelated devices makes things worse. Period. Even if the lock itself were held for the minimum possible time, i.e. trivially "spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing that one cache line around between 96 CPUs across two sockets is not negligible. > [1] http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L1028 > [2] http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L613 > > In any case, I can't imagine this working with V4L2 or anything else > relying on any memory management more generic than calling IOMMU API > directly from the driver, with the IOMMU device having runtime PM > enabled, but without managing the runtime PM from the IOMMU driver's > callbacks that need access to the hardware. As I mentioned before, > only the IOMMU driver knows when exactly the real hardware access > needs to be done (e.g. Rockchip/Exynos don't need to do that for > map/unmap if the power is down, but some implementations of SMMU with > TLB powered separately might need to do so). It's worth noting that Exynos and Rockchip are relatively small self-contained IP blocks integrated closely with the interfaces of their relevant master devices; SMMU is an architecture, implementations of which may be large, distributed, and have complex and wildly differing internal topologies. As such, it's a lot harder to make hardware-specific assumptions and/or be correct for all possible cases. Don't get me wrong, I do ultimately agree that the IOMMU driver is the only agent who ultimately knows what calls are going to be necessary for whatever operation it's performing on its own hardware*; it's just that for SMMU it needs to be implemented in a way that has zero impact on the cases where it doesn't matter, because it's not viable to specialise that driver for any particular IP implementation/use-case. Robin. *AFAICS it still makes some sense to have the get_suppliers option as well, though - the IOMMU driver does what it needs for correctness internally, but the external consumer doing something non-standard can can grab and hold the link around multiple calls to short-circuit that.
On Fri, Feb 16, 2018 at 2:14 AM, Robin Murphy <robin.murphy@arm.com> wrote: > On 15/02/18 04:17, Tomasz Figa wrote: > [...] >>> >>> Could you elaborate on what kind of locking you are concerned about? >>> As I explained before, the normally happening fast path would lock >>> dev->power_lock only for the brief moment of incrementing the runtime >>> PM usage counter. >> >> >> My bad, that's not even it. >> >> The atomic usage counter is incremented beforehands, without any >> locking [1] and the spinlock is acquired only for the sake of >> validating that device's runtime PM state remained valid indeed [2], >> which would be the case in the fast path of the same driver doing two >> mappings in parallel, with the master powered on (and so the SMMU, >> through device links; if master was not powered on already, powering >> on the SMMU is unavoidable anyway and it would add much more latency >> than the spinlock itself). > > > We now have no locking at all in the map path, and only a per-domain lock > around TLB sync in unmap which is unfortunately necessary for correctness; > the latter isn't too terrible, since in "serious" hardware it should only be > serialising a few cpus serving the same device against each other (e.g. for > multiple queues on a single NIC). > > Putting in a global lock which serialises *all* concurrent map and unmap > calls for *all* unrelated devices makes things worse. Period. Even if the > lock itself were held for the minimum possible time, i.e. trivially > "spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing that > one cache line around between 96 CPUs across two sockets is not negligible. Fair enough. Note that we're in a quite interesting situation now: a) We need to have runtime PM enabled on Qualcomm SoC to have power properly managed, b) We need to have lock-free map/unmap on such distributed systems, c) If runtime PM is enabled, we need to call into runtime PM from any code that does hardware accesses, otherwise the IOMMU API (and so DMA API and then any V4L2 driver) becomes unusable. I can see one more way that could potentially let us have all the three. How about enabling runtime PM only on selected implementations (e.g. qcom,smmu) and then having all the runtime PM calls surrounded with if (pm_runtime_enabled()), which is lockless? > >> [1] >> http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L1028 >> [2] >> http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L613 >> >> In any case, I can't imagine this working with V4L2 or anything else >> relying on any memory management more generic than calling IOMMU API >> directly from the driver, with the IOMMU device having runtime PM >> enabled, but without managing the runtime PM from the IOMMU driver's >> callbacks that need access to the hardware. As I mentioned before, >> only the IOMMU driver knows when exactly the real hardware access >> needs to be done (e.g. Rockchip/Exynos don't need to do that for >> map/unmap if the power is down, but some implementations of SMMU with >> TLB powered separately might need to do so). > > > It's worth noting that Exynos and Rockchip are relatively small > self-contained IP blocks integrated closely with the interfaces of their > relevant master devices; SMMU is an architecture, implementations of which > may be large, distributed, and have complex and wildly differing internal > topologies. As such, it's a lot harder to make hardware-specific assumptions > and/or be correct for all possible cases. > > Don't get me wrong, I do ultimately agree that the IOMMU driver is the only > agent who ultimately knows what calls are going to be necessary for whatever > operation it's performing on its own hardware*; it's just that for SMMU it > needs to be implemented in a way that has zero impact on the cases where it > doesn't matter, because it's not viable to specialise that driver for any > particular IP implementation/use-case. Still, exactly the same holds for the low power embedded use cases, where we strive for the lowest possible power consumption, while maintaining performance levels high as well. And so the SMMU code is expected to also work with our use cases, such as V4L2 or DRM drivers. Since these points don't hold for current SMMU code, I could say that the it has been already specialized for large, distributed implementations. Best regards, Tomasz
On Fri, Feb 16, 2018 at 9:13 AM, Tomasz Figa <tfiga@chromium.org> wrote: > On Fri, Feb 16, 2018 at 2:14 AM, Robin Murphy <robin.murphy@arm.com> wrote: >> On 15/02/18 04:17, Tomasz Figa wrote: >> [...] >>>> >>>> Could you elaborate on what kind of locking you are concerned about? >>>> As I explained before, the normally happening fast path would lock >>>> dev->power_lock only for the brief moment of incrementing the runtime >>>> PM usage counter. >>> >>> >>> My bad, that's not even it. >>> >>> The atomic usage counter is incremented beforehands, without any >>> locking [1] and the spinlock is acquired only for the sake of >>> validating that device's runtime PM state remained valid indeed [2], >>> which would be the case in the fast path of the same driver doing two >>> mappings in parallel, with the master powered on (and so the SMMU, >>> through device links; if master was not powered on already, powering >>> on the SMMU is unavoidable anyway and it would add much more latency >>> than the spinlock itself). >> >> >> We now have no locking at all in the map path, and only a per-domain lock >> around TLB sync in unmap which is unfortunately necessary for correctness; >> the latter isn't too terrible, since in "serious" hardware it should only be >> serialising a few cpus serving the same device against each other (e.g. for >> multiple queues on a single NIC). >> >> Putting in a global lock which serialises *all* concurrent map and unmap >> calls for *all* unrelated devices makes things worse. Period. Even if the >> lock itself were held for the minimum possible time, i.e. trivially >> "spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing that >> one cache line around between 96 CPUs across two sockets is not negligible. > > Fair enough. Note that we're in a quite interesting situation now: > a) We need to have runtime PM enabled on Qualcomm SoC to have power > properly managed, > b) We need to have lock-free map/unmap on such distributed systems, > c) If runtime PM is enabled, we need to call into runtime PM from any > code that does hardware accesses, otherwise the IOMMU API (and so DMA > API and then any V4L2 driver) becomes unusable. > > I can see one more way that could potentially let us have all the > three. How about enabling runtime PM only on selected implementations > (e.g. qcom,smmu) and then having all the runtime PM calls surrounded > with if (pm_runtime_enabled()), which is lockless? > Sorry for pinging, but any opinion on this kind of approach? Best regards, Tomasz >> >>> [1] >>> http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L1028 >>> [2] >>> http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L613 >>> >>> In any case, I can't imagine this working with V4L2 or anything else >>> relying on any memory management more generic than calling IOMMU API >>> directly from the driver, with the IOMMU device having runtime PM >>> enabled, but without managing the runtime PM from the IOMMU driver's >>> callbacks that need access to the hardware. As I mentioned before, >>> only the IOMMU driver knows when exactly the real hardware access >>> needs to be done (e.g. Rockchip/Exynos don't need to do that for >>> map/unmap if the power is down, but some implementations of SMMU with >>> TLB powered separately might need to do so). >> >> >> It's worth noting that Exynos and Rockchip are relatively small >> self-contained IP blocks integrated closely with the interfaces of their >> relevant master devices; SMMU is an architecture, implementations of which >> may be large, distributed, and have complex and wildly differing internal >> topologies. As such, it's a lot harder to make hardware-specific assumptions >> and/or be correct for all possible cases. >> >> Don't get me wrong, I do ultimately agree that the IOMMU driver is the only >> agent who ultimately knows what calls are going to be necessary for whatever >> operation it's performing on its own hardware*; it's just that for SMMU it >> needs to be implemented in a way that has zero impact on the cases where it >> doesn't matter, because it's not viable to specialise that driver for any >> particular IP implementation/use-case. > > Still, exactly the same holds for the low power embedded use cases, > where we strive for the lowest possible power consumption, while > maintaining performance levels high as well. And so the SMMU code is > expected to also work with our use cases, such as V4L2 or DRM drivers. > Since these points don't hold for current SMMU code, I could say that > the it has been already specialized for large, distributed > implementations. > > Best regards, > Tomasz
On Thu, Feb 22, 2018 at 3:13 AM, Tomasz Figa <tfiga@chromium.org> wrote: > On Fri, Feb 16, 2018 at 9:13 AM, Tomasz Figa <tfiga@chromium.org> wrote: >> On Fri, Feb 16, 2018 at 2:14 AM, Robin Murphy <robin.murphy@arm.com> wrote: >>> On 15/02/18 04:17, Tomasz Figa wrote: >>> [...] >>>>> >>>>> Could you elaborate on what kind of locking you are concerned about? >>>>> As I explained before, the normally happening fast path would lock >>>>> dev->power_lock only for the brief moment of incrementing the runtime >>>>> PM usage counter. >>>> >>>> >>>> My bad, that's not even it. >>>> >>>> The atomic usage counter is incremented beforehands, without any >>>> locking [1] and the spinlock is acquired only for the sake of >>>> validating that device's runtime PM state remained valid indeed [2], >>>> which would be the case in the fast path of the same driver doing two >>>> mappings in parallel, with the master powered on (and so the SMMU, >>>> through device links; if master was not powered on already, powering >>>> on the SMMU is unavoidable anyway and it would add much more latency >>>> than the spinlock itself). >>> >>> >>> We now have no locking at all in the map path, and only a per-domain lock >>> around TLB sync in unmap which is unfortunately necessary for correctness; >>> the latter isn't too terrible, since in "serious" hardware it should only be >>> serialising a few cpus serving the same device against each other (e.g. for >>> multiple queues on a single NIC). >>> >>> Putting in a global lock which serialises *all* concurrent map and unmap >>> calls for *all* unrelated devices makes things worse. Period. Even if the >>> lock itself were held for the minimum possible time, i.e. trivially >>> "spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing that >>> one cache line around between 96 CPUs across two sockets is not negligible. >> >> Fair enough. Note that we're in a quite interesting situation now: >> a) We need to have runtime PM enabled on Qualcomm SoC to have power >> properly managed, >> b) We need to have lock-free map/unmap on such distributed systems, >> c) If runtime PM is enabled, we need to call into runtime PM from any >> code that does hardware accesses, otherwise the IOMMU API (and so DMA >> API and then any V4L2 driver) becomes unusable. >> >> I can see one more way that could potentially let us have all the >> three. How about enabling runtime PM only on selected implementations >> (e.g. qcom,smmu) and then having all the runtime PM calls surrounded >> with if (pm_runtime_enabled()), which is lockless? >> > > Sorry for pinging, but any opinion on this kind of approach? > It is ok by me, for whatever that is worth BR, -R
[sorry, I had intended to reply sooner but clearly forgot] On 16/02/18 00:13, Tomasz Figa wrote: > On Fri, Feb 16, 2018 at 2:14 AM, Robin Murphy <robin.murphy@arm.com> wrote: >> On 15/02/18 04:17, Tomasz Figa wrote: >> [...] >>>> >>>> Could you elaborate on what kind of locking you are concerned about? >>>> As I explained before, the normally happening fast path would lock >>>> dev->power_lock only for the brief moment of incrementing the runtime >>>> PM usage counter. >>> >>> >>> My bad, that's not even it. >>> >>> The atomic usage counter is incremented beforehands, without any >>> locking [1] and the spinlock is acquired only for the sake of >>> validating that device's runtime PM state remained valid indeed [2], >>> which would be the case in the fast path of the same driver doing two >>> mappings in parallel, with the master powered on (and so the SMMU, >>> through device links; if master was not powered on already, powering >>> on the SMMU is unavoidable anyway and it would add much more latency >>> than the spinlock itself). >> >> >> We now have no locking at all in the map path, and only a per-domain lock >> around TLB sync in unmap which is unfortunately necessary for correctness; >> the latter isn't too terrible, since in "serious" hardware it should only be >> serialising a few cpus serving the same device against each other (e.g. for >> multiple queues on a single NIC). >> >> Putting in a global lock which serialises *all* concurrent map and unmap >> calls for *all* unrelated devices makes things worse. Period. Even if the >> lock itself were held for the minimum possible time, i.e. trivially >> "spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing that >> one cache line around between 96 CPUs across two sockets is not negligible. > > Fair enough. Note that we're in a quite interesting situation now: > a) We need to have runtime PM enabled on Qualcomm SoC to have power > properly managed, > b) We need to have lock-free map/unmap on such distributed systems, > c) If runtime PM is enabled, we need to call into runtime PM from any > code that does hardware accesses, otherwise the IOMMU API (and so DMA > API and then any V4L2 driver) becomes unusable. > > I can see one more way that could potentially let us have all the > three. How about enabling runtime PM only on selected implementations > (e.g. qcom,smmu) and then having all the runtime PM calls surrounded > with if (pm_runtime_enabled()), which is lockless? Yes, that's the kind of thing I was gravitating towards - my vague thought was adding some flag to the smmu_domain, but pm_runtime_enabled() does look conceptually a lot cleaner. >> >>> [1] >>> http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L1028 >>> [2] >>> http://elixir.free-electrons.com/linux/v4.16-rc1/source/drivers/base/power/runtime.c#L613 >>> >>> In any case, I can't imagine this working with V4L2 or anything else >>> relying on any memory management more generic than calling IOMMU API >>> directly from the driver, with the IOMMU device having runtime PM >>> enabled, but without managing the runtime PM from the IOMMU driver's >>> callbacks that need access to the hardware. As I mentioned before, >>> only the IOMMU driver knows when exactly the real hardware access >>> needs to be done (e.g. Rockchip/Exynos don't need to do that for >>> map/unmap if the power is down, but some implementations of SMMU with >>> TLB powered separately might need to do so). >> >> >> It's worth noting that Exynos and Rockchip are relatively small >> self-contained IP blocks integrated closely with the interfaces of their >> relevant master devices; SMMU is an architecture, implementations of which >> may be large, distributed, and have complex and wildly differing internal >> topologies. As such, it's a lot harder to make hardware-specific assumptions >> and/or be correct for all possible cases. >> >> Don't get me wrong, I do ultimately agree that the IOMMU driver is the only >> agent who ultimately knows what calls are going to be necessary for whatever >> operation it's performing on its own hardware*; it's just that for SMMU it >> needs to be implemented in a way that has zero impact on the cases where it >> doesn't matter, because it's not viable to specialise that driver for any >> particular IP implementation/use-case. > > Still, exactly the same holds for the low power embedded use cases, > where we strive for the lowest possible power consumption, while > maintaining performance levels high as well. And so the SMMU code is > expected to also work with our use cases, such as V4L2 or DRM drivers. > Since these points don't hold for current SMMU code, I could say that > the it has been already specialized for large, distributed > implementations. Heh, really it's specialised for ease of maintenance in terms of doing as little as we can get away with, but for what we have implemented, fast code does save CPU cycles and power on embedded systems too ;) Robin.
On Thu, Feb 22, 2018 at 10:45 PM, Robin Murphy <robin.murphy@arm.com> wrote: > [sorry, I had intended to reply sooner but clearly forgot] > > > On 16/02/18 00:13, Tomasz Figa wrote: >> >> On Fri, Feb 16, 2018 at 2:14 AM, Robin Murphy <robin.murphy@arm.com> >> wrote: >>> >>> On 15/02/18 04:17, Tomasz Figa wrote: >>> [...] >>>>> >>>>> >>>>> Could you elaborate on what kind of locking you are concerned about? >>>>> As I explained before, the normally happening fast path would lock >>>>> dev->power_lock only for the brief moment of incrementing the runtime >>>>> PM usage counter. >>>> >>>> >>>> >>>> My bad, that's not even it. >>>> >>>> The atomic usage counter is incremented beforehands, without any >>>> locking [1] and the spinlock is acquired only for the sake of >>>> validating that device's runtime PM state remained valid indeed [2], >>>> which would be the case in the fast path of the same driver doing two >>>> mappings in parallel, with the master powered on (and so the SMMU, >>>> through device links; if master was not powered on already, powering >>>> on the SMMU is unavoidable anyway and it would add much more latency >>>> than the spinlock itself). >>> >>> >>> >>> We now have no locking at all in the map path, and only a per-domain lock >>> around TLB sync in unmap which is unfortunately necessary for >>> correctness; >>> the latter isn't too terrible, since in "serious" hardware it should only >>> be >>> serialising a few cpus serving the same device against each other (e.g. >>> for >>> multiple queues on a single NIC). >>> >>> Putting in a global lock which serialises *all* concurrent map and unmap >>> calls for *all* unrelated devices makes things worse. Period. Even if the >>> lock itself were held for the minimum possible time, i.e. trivially >>> "spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing >>> that >>> one cache line around between 96 CPUs across two sockets is not >>> negligible. >> >> >> Fair enough. Note that we're in a quite interesting situation now: >> a) We need to have runtime PM enabled on Qualcomm SoC to have power >> properly managed, >> b) We need to have lock-free map/unmap on such distributed systems, >> c) If runtime PM is enabled, we need to call into runtime PM from any >> code that does hardware accesses, otherwise the IOMMU API (and so DMA >> API and then any V4L2 driver) becomes unusable. >> >> I can see one more way that could potentially let us have all the >> three. How about enabling runtime PM only on selected implementations >> (e.g. qcom,smmu) and then having all the runtime PM calls surrounded >> with if (pm_runtime_enabled()), which is lockless? > > > Yes, that's the kind of thing I was gravitating towards - my vague thought > was adding some flag to the smmu_domain, but pm_runtime_enabled() does look > conceptually a lot cleaner. Great, thanks. Looks like we're in agreement now. \o/ Vivek, does this sound reasonable to you? Best regards, Tomasz
Hi, On Thu, Feb 22, 2018 at 7:42 PM, Tomasz Figa <tfiga@chromium.org> wrote: > On Thu, Feb 22, 2018 at 10:45 PM, Robin Murphy <robin.murphy@arm.com> wrote: >> [sorry, I had intended to reply sooner but clearly forgot] >> >> >> On 16/02/18 00:13, Tomasz Figa wrote: >>> >>> On Fri, Feb 16, 2018 at 2:14 AM, Robin Murphy <robin.murphy@arm.com> >>> wrote: >>>> >>>> On 15/02/18 04:17, Tomasz Figa wrote: >>>> [...] >>>>>> >>>>>> >>>>>> Could you elaborate on what kind of locking you are concerned about? >>>>>> As I explained before, the normally happening fast path would lock >>>>>> dev->power_lock only for the brief moment of incrementing the runtime >>>>>> PM usage counter. >>>>> >>>>> >>>>> >>>>> My bad, that's not even it. >>>>> >>>>> The atomic usage counter is incremented beforehands, without any >>>>> locking [1] and the spinlock is acquired only for the sake of >>>>> validating that device's runtime PM state remained valid indeed [2], >>>>> which would be the case in the fast path of the same driver doing two >>>>> mappings in parallel, with the master powered on (and so the SMMU, >>>>> through device links; if master was not powered on already, powering >>>>> on the SMMU is unavoidable anyway and it would add much more latency >>>>> than the spinlock itself). >>>> >>>> >>>> >>>> We now have no locking at all in the map path, and only a per-domain lock >>>> around TLB sync in unmap which is unfortunately necessary for >>>> correctness; >>>> the latter isn't too terrible, since in "serious" hardware it should only >>>> be >>>> serialising a few cpus serving the same device against each other (e.g. >>>> for >>>> multiple queues on a single NIC). >>>> >>>> Putting in a global lock which serialises *all* concurrent map and unmap >>>> calls for *all* unrelated devices makes things worse. Period. Even if the >>>> lock itself were held for the minimum possible time, i.e. trivially >>>> "spin_lock(&lock); spin_unlock(&lock)", the cost of repeatedly bouncing >>>> that >>>> one cache line around between 96 CPUs across two sockets is not >>>> negligible. >>> >>> >>> Fair enough. Note that we're in a quite interesting situation now: >>> a) We need to have runtime PM enabled on Qualcomm SoC to have power >>> properly managed, >>> b) We need to have lock-free map/unmap on such distributed systems, >>> c) If runtime PM is enabled, we need to call into runtime PM from any >>> code that does hardware accesses, otherwise the IOMMU API (and so DMA >>> API and then any V4L2 driver) becomes unusable. >>> >>> I can see one more way that could potentially let us have all the >>> three. How about enabling runtime PM only on selected implementations >>> (e.g. qcom,smmu) and then having all the runtime PM calls surrounded >>> with if (pm_runtime_enabled()), which is lockless? >> >> >> Yes, that's the kind of thing I was gravitating towards - my vague thought >> was adding some flag to the smmu_domain, but pm_runtime_enabled() does look >> conceptually a lot cleaner. > > Great, thanks. Looks like we're in agreement now. \o/ > > Vivek, does this sound reasonable to you? Yea, sound good to me. I will respin the patches. Thanks & Regards Vivek > > Best regards, > Tomasz
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c index b23d33622f37..1ab629bbee69 100644 --- a/drivers/gpu/drm/msm/msm_iommu.c +++ b/drivers/gpu/drm/msm/msm_iommu.c @@ -40,9 +40,9 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names, struct msm_iommu *iommu = to_msm_iommu(mmu); int ret; - pm_runtime_get_sync(mmu->dev); + pm_runtime_get_suppliers(mmu->dev); ret = iommu_attach_device(iommu->domain, mmu->dev); - pm_runtime_put_sync(mmu->dev); + pm_runtime_put_suppliers(mmu->dev); return ret; } @@ -52,9 +52,9 @@ static void msm_iommu_detach(struct msm_mmu *mmu, const char * const *names, { struct msm_iommu *iommu = to_msm_iommu(mmu); - pm_runtime_get_sync(mmu->dev); + pm_runtime_get_suppliers(mmu->dev); iommu_detach_device(iommu->domain, mmu->dev); - pm_runtime_put_sync(mmu->dev); + pm_runtime_put_suppliers(mmu->dev); } static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova, @@ -63,9 +63,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova, struct msm_iommu *iommu = to_msm_iommu(mmu); size_t ret; -// pm_runtime_get_sync(mmu->dev); + pm_runtime_get_suppliers(mmu->dev); ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot); -// pm_runtime_put_sync(mmu->dev); + pm_runtime_put_suppliers(mmu->dev); WARN_ON(ret < 0); return (ret == len) ? 0 : -EINVAL; @@ -76,9 +76,9 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t iova, { struct msm_iommu *iommu = to_msm_iommu(mmu); - pm_runtime_get_sync(mmu->dev); + pm_runtime_get_suppliers(mmu->dev); iommu_unmap(iommu->domain, iova, len); - pm_runtime_put_sync(mmu->dev); + pm_runtime_put_suppliers(mmu->dev); return 0; }
While handling the concerned iommu, there should not be a need to power control the drm devices from iommu interface. If these drm devices need to be powered around this time, the respective drivers should take care of this. Replace the pm_runtime_get/put_sync(<drm_device>) with pm_runtime_get/put_suppliers(<drm_device>) calls, to power-up the connected iommu through the device link interface. In case the device link is not setup these get/put_suppliers() calls will be a no-op, and the iommu driver should take care of powering on its devices accordingly. Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org> --- drivers/gpu/drm/msm/msm_iommu.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)