diff mbox series

[1/2] drm/i915/display: remove small micro-optimizations in irq handling

Message ID 20240408125445.3227678-1-jani.nikula@intel.com (mailing list archive)
State New, archived
Headers show
Series [1/2] drm/i915/display: remove small micro-optimizations in irq handling | expand

Commit Message

Jani Nikula April 8, 2024, 12:54 p.m. UTC
The raw register reads/writes are there as micro-optimizations to avoid
multiple pointer indirections on uncore->regs. Presumably this is useful
when there are plenty of register reads/writes in the same
function. However, the display irq handling only has a few raw
reads/writes. Remove them for simplification.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>
---
 drivers/gpu/drm/i915/display/intel_display_irq.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

Comments

Sripada, Radhakrishna April 17, 2024, 8:42 p.m. UTC | #1
LGTM,
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>

> -----Original Message-----
> From: Intel-gfx <intel-gfx-bounces@lists.freedesktop.org> On Behalf Of Jani
> Nikula
> Sent: Monday, April 8, 2024 5:55 AM
> To: intel-gfx@lists.freedesktop.org; intel-xe@lists.freedesktop.org
> Cc: Nikula, Jani <jani.nikula@intel.com>; De Marchi, Lucas
> <lucas.demarchi@intel.com>
> Subject: [PATCH 1/2] drm/i915/display: remove small micro-optimizations in irq
> handling
> 
> The raw register reads/writes are there as micro-optimizations to avoid
> multiple pointer indirections on uncore->regs. Presumably this is useful
> when there are plenty of register reads/writes in the same
> function. However, the display irq handling only has a few raw
> reads/writes. Remove them for simplification.
> 
> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
> ---
>  drivers/gpu/drm/i915/display/intel_display_irq.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display_irq.c
> b/drivers/gpu/drm/i915/display/intel_display_irq.c
> index f846c5b108b5..d4ae9139be39 100644
> --- a/drivers/gpu/drm/i915/display/intel_display_irq.c
> +++ b/drivers/gpu/drm/i915/display/intel_display_irq.c
> @@ -1148,15 +1148,14 @@ void gen8_de_irq_handler(struct drm_i915_private
> *dev_priv, u32 master_ctl)
> 
>  u32 gen11_gu_misc_irq_ack(struct drm_i915_private *i915, const u32
> master_ctl)
>  {
> -	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
>  	u32 iir;
> 
>  	if (!(master_ctl & GEN11_GU_MISC_IRQ))
>  		return 0;
> 
> -	iir = raw_reg_read(regs, GEN11_GU_MISC_IIR);
> +	iir = intel_de_read(i915, GEN11_GU_MISC_IIR);
>  	if (likely(iir))
> -		raw_reg_write(regs, GEN11_GU_MISC_IIR, iir);
> +		intel_de_write(i915, GEN11_GU_MISC_IIR, iir);
> 
>  	return iir;
>  }
> @@ -1169,18 +1168,18 @@ void gen11_gu_misc_irq_handler(struct
> drm_i915_private *i915, const u32 iir)
> 
>  void gen11_display_irq_handler(struct drm_i915_private *i915)
>  {
> -	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
> -	const u32 disp_ctl = raw_reg_read(regs, GEN11_DISPLAY_INT_CTL);
> +	u32 disp_ctl;
> 
>  	disable_rpm_wakeref_asserts(&i915->runtime_pm);
>  	/*
>  	 * GEN11_DISPLAY_INT_CTL has same format as GEN8_MASTER_IRQ
>  	 * for the display related bits.
>  	 */
> -	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL, 0x0);
> +	disp_ctl = intel_de_read(i915, GEN11_DISPLAY_INT_CTL);
> +
> +	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, 0);
>  	gen8_de_irq_handler(i915, disp_ctl);
> -	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL,
> -		      GEN11_DISPLAY_IRQ_ENABLE);
> +	intel_de_write(i915, GEN11_DISPLAY_INT_CTL,
> GEN11_DISPLAY_IRQ_ENABLE);
> 
>  	enable_rpm_wakeref_asserts(&i915->runtime_pm);
>  }
> --
> 2.39.2
Lucas De Marchi April 17, 2024, 9:44 p.m. UTC | #2
On Mon, Apr 08, 2024 at 03:54:44PM GMT, Jani Nikula wrote:
>The raw register reads/writes are there as micro-optimizations to avoid
>multiple pointer indirections on uncore->regs. Presumably this is useful
>when there are plenty of register reads/writes in the same
>function. However, the display irq handling only has a few raw
>reads/writes. Remove them for simplification.

I think that comment didn't age well. Not to say there's something wrong
with this commit, but just to make sure we are aware of the additional
stuff going on and we if we are ok with that.

using intel_de_read() in place of raw_reg_read() will do (for newer
platforms):

	1) Read FPGA_DBG to detect unclaimed access before the actual read
	2) Find the relevant forcewake for that register, acquire and wait for ack
	3) readl(reg)
	4) Read FPGA_DBG to detect unclaimed access after the actual read
	5) Trace reg rw

That's much more than a pointer indirection. Are we ok with that in the
irq?  Also, I don't know why but we have variants to skip tracing (step
5 above), but on my books a disabled tracepoint is order of magnitudes
less overhead than 1, 2 and 4.

btw, if we drop the raw accesses, then we can probably drop (1) above.

Lucas De Marchi

>
>Signed-off-by: Jani Nikula <jani.nikula@intel.com>
>---
> drivers/gpu/drm/i915/display/intel_display_irq.c | 15 +++++++--------
> 1 file changed, 7 insertions(+), 8 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/display/intel_display_irq.c b/drivers/gpu/drm/i915/display/intel_display_irq.c
>index f846c5b108b5..d4ae9139be39 100644
>--- a/drivers/gpu/drm/i915/display/intel_display_irq.c
>+++ b/drivers/gpu/drm/i915/display/intel_display_irq.c
>@@ -1148,15 +1148,14 @@ void gen8_de_irq_handler(struct drm_i915_private *dev_priv, u32 master_ctl)
>
> u32 gen11_gu_misc_irq_ack(struct drm_i915_private *i915, const u32 master_ctl)
> {
>-	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
> 	u32 iir;
>
> 	if (!(master_ctl & GEN11_GU_MISC_IRQ))
> 		return 0;
>
>-	iir = raw_reg_read(regs, GEN11_GU_MISC_IIR);
>+	iir = intel_de_read(i915, GEN11_GU_MISC_IIR);
> 	if (likely(iir))
>-		raw_reg_write(regs, GEN11_GU_MISC_IIR, iir);
>+		intel_de_write(i915, GEN11_GU_MISC_IIR, iir);
>
> 	return iir;
> }
>@@ -1169,18 +1168,18 @@ void gen11_gu_misc_irq_handler(struct drm_i915_private *i915, const u32 iir)
>
> void gen11_display_irq_handler(struct drm_i915_private *i915)
> {
>-	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
>-	const u32 disp_ctl = raw_reg_read(regs, GEN11_DISPLAY_INT_CTL);
>+	u32 disp_ctl;
>
> 	disable_rpm_wakeref_asserts(&i915->runtime_pm);
> 	/*
> 	 * GEN11_DISPLAY_INT_CTL has same format as GEN8_MASTER_IRQ
> 	 * for the display related bits.
> 	 */
>-	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL, 0x0);
>+	disp_ctl = intel_de_read(i915, GEN11_DISPLAY_INT_CTL);
>+
>+	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, 0);
> 	gen8_de_irq_handler(i915, disp_ctl);
>-	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL,
>-		      GEN11_DISPLAY_IRQ_ENABLE);
>+	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, GEN11_DISPLAY_IRQ_ENABLE);
>
> 	enable_rpm_wakeref_asserts(&i915->runtime_pm);
> }
>-- 
>2.39.2
>
Jani Nikula April 18, 2024, 9:49 a.m. UTC | #3
On Wed, 17 Apr 2024, Lucas De Marchi <lucas.demarchi@intel.com> wrote:
> On Mon, Apr 08, 2024 at 03:54:44PM GMT, Jani Nikula wrote:
>>The raw register reads/writes are there as micro-optimizations to avoid
>>multiple pointer indirections on uncore->regs. Presumably this is useful
>>when there are plenty of register reads/writes in the same
>>function. However, the display irq handling only has a few raw
>>reads/writes. Remove them for simplification.
>
> I think that comment didn't age well. Not to say there's something wrong
> with this commit, but just to make sure we are aware of the additional
> stuff going on and we if we are ok with that.
>
> using intel_de_read() in place of raw_reg_read() will do (for newer
> platforms):
>
> 	1) Read FPGA_DBG to detect unclaimed access before the actual read
> 	2) Find the relevant forcewake for that register, acquire and wait for ack
> 	3) readl(reg)
> 	4) Read FPGA_DBG to detect unclaimed access after the actual read
> 	5) Trace reg rw
>
> That's much more than a pointer indirection. Are we ok with that in the
> irq?  Also, I don't know why but we have variants to skip tracing (step
> 5 above), but on my books a disabled tracepoint is order of magnitudes
> less overhead than 1, 2 and 4.

Honestly, I don't really know.

The thing is, we have these ad hoc optimizations all over the place. Why
do we have the raw access in two places, but not everywhere in irq
handling? The pointer indirection thing really only makes sense if you
have a lot of access in a function, but that's not the case. You do have
a point about everything else.

What would the interface be like if display were its own module? We
couldn't just wrap it all in a bunch of macros and static inlines. Is
the end result that display irq handling needs to call functions via
pointers in another module? Or do we need to move the register level irq
handling to xe and i915 cores, and handle the display parts at a higher
abstraction level?

BR,
Jani.


>
> btw, if we drop the raw accesses, then we can probably drop (1) above.
>
> Lucas De Marchi
>
>>
>>Signed-off-by: Jani Nikula <jani.nikula@intel.com>
>>---
>> drivers/gpu/drm/i915/display/intel_display_irq.c | 15 +++++++--------
>> 1 file changed, 7 insertions(+), 8 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/i915/display/intel_display_irq.c b/drivers/gpu/drm/i915/display/intel_display_irq.c
>>index f846c5b108b5..d4ae9139be39 100644
>>--- a/drivers/gpu/drm/i915/display/intel_display_irq.c
>>+++ b/drivers/gpu/drm/i915/display/intel_display_irq.c
>>@@ -1148,15 +1148,14 @@ void gen8_de_irq_handler(struct drm_i915_private *dev_priv, u32 master_ctl)
>>
>> u32 gen11_gu_misc_irq_ack(struct drm_i915_private *i915, const u32 master_ctl)
>> {
>>-	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
>> 	u32 iir;
>>
>> 	if (!(master_ctl & GEN11_GU_MISC_IRQ))
>> 		return 0;
>>
>>-	iir = raw_reg_read(regs, GEN11_GU_MISC_IIR);
>>+	iir = intel_de_read(i915, GEN11_GU_MISC_IIR);
>> 	if (likely(iir))
>>-		raw_reg_write(regs, GEN11_GU_MISC_IIR, iir);
>>+		intel_de_write(i915, GEN11_GU_MISC_IIR, iir);
>>
>> 	return iir;
>> }
>>@@ -1169,18 +1168,18 @@ void gen11_gu_misc_irq_handler(struct drm_i915_private *i915, const u32 iir)
>>
>> void gen11_display_irq_handler(struct drm_i915_private *i915)
>> {
>>-	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
>>-	const u32 disp_ctl = raw_reg_read(regs, GEN11_DISPLAY_INT_CTL);
>>+	u32 disp_ctl;
>>
>> 	disable_rpm_wakeref_asserts(&i915->runtime_pm);
>> 	/*
>> 	 * GEN11_DISPLAY_INT_CTL has same format as GEN8_MASTER_IRQ
>> 	 * for the display related bits.
>> 	 */
>>-	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL, 0x0);
>>+	disp_ctl = intel_de_read(i915, GEN11_DISPLAY_INT_CTL);
>>+
>>+	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, 0);
>> 	gen8_de_irq_handler(i915, disp_ctl);
>>-	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL,
>>-		      GEN11_DISPLAY_IRQ_ENABLE);
>>+	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, GEN11_DISPLAY_IRQ_ENABLE);
>>
>> 	enable_rpm_wakeref_asserts(&i915->runtime_pm);
>> }
>>-- 
>>2.39.2
>>
Tvrtko Ursulin April 18, 2024, 10:57 a.m. UTC | #4
On 18/04/2024 10:49, Jani Nikula wrote:
> On Wed, 17 Apr 2024, Lucas De Marchi <lucas.demarchi@intel.com> wrote:
>> On Mon, Apr 08, 2024 at 03:54:44PM GMT, Jani Nikula wrote:
>>> The raw register reads/writes are there as micro-optimizations to avoid
>>> multiple pointer indirections on uncore->regs. Presumably this is useful
>>> when there are plenty of register reads/writes in the same
>>> function. However, the display irq handling only has a few raw
>>> reads/writes. Remove them for simplification.
>>
>> I think that comment didn't age well. Not to say there's something wrong
>> with this commit, but just to make sure we are aware of the additional
>> stuff going on and we if we are ok with that.
>>
>> using intel_de_read() in place of raw_reg_read() will do (for newer
>> platforms):
>>
>> 	1) Read FPGA_DBG to detect unclaimed access before the actual read
>> 	2) Find the relevant forcewake for that register, acquire and wait for ack
>> 	3) readl(reg)
>> 	4) Read FPGA_DBG to detect unclaimed access after the actual read
>> 	5) Trace reg rw
>>
>> That's much more than a pointer indirection. Are we ok with that in the
>> irq?  Also, I don't know why but we have variants to skip tracing (step
>> 5 above), but on my books a disabled tracepoint is order of magnitudes
>> less overhead than 1, 2 and 4.
> 
> Honestly, I don't really know.
> 
> The thing is, we have these ad hoc optimizations all over the place. Why
> do we have the raw access in two places, but not everywhere in irq
> handling? The pointer indirection thing really only makes sense if you
> have a lot of access in a function, but that's not the case. You do have
> a point about everything else.

The "why only two" places is I think simply an artefact of refactoring 
and code evolution. Initially all IRQ handling was in one function, then 
later gen11 and display parts got split out as more platforms were 
added. For example a3265d851e28 ("drm/i915/irq: Refactor gen11 display 
interrupt handling").

As for the original rationale, it was described in commits like:

2e4a5b25886c ("drm/i915: Prune gen8_gt_irq_handler")
c48a798a7447 ("drm/i915: Trim the ironlake+ irq handler")

Obviosuly, once a portion of a handler was/is extracted, pointer caching 
to avoid uncore->regs reloads may not make full sense any more due 
function calls potentially overshadowing that cost.

As for unclaimed debug, I would say it is probably okay to not burden 
the irq handlers with it, but if the display folks think a little bit of 
extra cost in this sub-handlers is fine that would sound plausible to me 
given the frequency of display related interrupts is low. So for me 
patch is fine if it makes the display decoupling easier.

> What would the interface be like if display were its own module? We
> couldn't just wrap it all in a bunch of macros and static inlines. Is
> the end result that display irq handling needs to call functions via
> pointers in another module? Or do we need to move the register level irq
> handling to xe and i915 cores, and handle the display parts at a higher
> abstraction level?

AFAIR no trace variants were not for performance but to avoid log spam 
when debugging stuff. From things like busy/polling loops.

Regards,

Tvrtko
>>
>> btw, if we drop the raw accesses, then we can probably drop (1) above.
>>
>> Lucas De Marchi
>>
>>>
>>> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
>>> ---
>>> drivers/gpu/drm/i915/display/intel_display_irq.c | 15 +++++++--------
>>> 1 file changed, 7 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/display/intel_display_irq.c b/drivers/gpu/drm/i915/display/intel_display_irq.c
>>> index f846c5b108b5..d4ae9139be39 100644
>>> --- a/drivers/gpu/drm/i915/display/intel_display_irq.c
>>> +++ b/drivers/gpu/drm/i915/display/intel_display_irq.c
>>> @@ -1148,15 +1148,14 @@ void gen8_de_irq_handler(struct drm_i915_private *dev_priv, u32 master_ctl)
>>>
>>> u32 gen11_gu_misc_irq_ack(struct drm_i915_private *i915, const u32 master_ctl)
>>> {
>>> -	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
>>> 	u32 iir;
>>>
>>> 	if (!(master_ctl & GEN11_GU_MISC_IRQ))
>>> 		return 0;
>>>
>>> -	iir = raw_reg_read(regs, GEN11_GU_MISC_IIR);
>>> +	iir = intel_de_read(i915, GEN11_GU_MISC_IIR);
>>> 	if (likely(iir))
>>> -		raw_reg_write(regs, GEN11_GU_MISC_IIR, iir);
>>> +		intel_de_write(i915, GEN11_GU_MISC_IIR, iir);
>>>
>>> 	return iir;
>>> }
>>> @@ -1169,18 +1168,18 @@ void gen11_gu_misc_irq_handler(struct drm_i915_private *i915, const u32 iir)
>>>
>>> void gen11_display_irq_handler(struct drm_i915_private *i915)
>>> {
>>> -	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
>>> -	const u32 disp_ctl = raw_reg_read(regs, GEN11_DISPLAY_INT_CTL);
>>> +	u32 disp_ctl;
>>>
>>> 	disable_rpm_wakeref_asserts(&i915->runtime_pm);
>>> 	/*
>>> 	 * GEN11_DISPLAY_INT_CTL has same format as GEN8_MASTER_IRQ
>>> 	 * for the display related bits.
>>> 	 */
>>> -	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL, 0x0);
>>> +	disp_ctl = intel_de_read(i915, GEN11_DISPLAY_INT_CTL);
>>> +
>>> +	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, 0);
>>> 	gen8_de_irq_handler(i915, disp_ctl);
>>> -	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL,
>>> -		      GEN11_DISPLAY_IRQ_ENABLE);
>>> +	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, GEN11_DISPLAY_IRQ_ENABLE);
>>>
>>> 	enable_rpm_wakeref_asserts(&i915->runtime_pm);
>>> }
>>> -- 
>>> 2.39.2
>>>
>
Jani Nikula Sept. 17, 2024, 10:58 a.m. UTC | #5
On Thu, 18 Apr 2024, Tvrtko Ursulin <tursulin@ursulin.net> wrote:
> On 18/04/2024 10:49, Jani Nikula wrote:
>> On Wed, 17 Apr 2024, Lucas De Marchi <lucas.demarchi@intel.com> wrote:
>>> On Mon, Apr 08, 2024 at 03:54:44PM GMT, Jani Nikula wrote:
>>>> The raw register reads/writes are there as micro-optimizations to avoid
>>>> multiple pointer indirections on uncore->regs. Presumably this is useful
>>>> when there are plenty of register reads/writes in the same
>>>> function. However, the display irq handling only has a few raw
>>>> reads/writes. Remove them for simplification.
>>>
>>> I think that comment didn't age well. Not to say there's something wrong
>>> with this commit, but just to make sure we are aware of the additional
>>> stuff going on and we if we are ok with that.
>>>
>>> using intel_de_read() in place of raw_reg_read() will do (for newer
>>> platforms):
>>>
>>> 	1) Read FPGA_DBG to detect unclaimed access before the actual read
>>> 	2) Find the relevant forcewake for that register, acquire and wait for ack
>>> 	3) readl(reg)
>>> 	4) Read FPGA_DBG to detect unclaimed access after the actual read
>>> 	5) Trace reg rw
>>>
>>> That's much more than a pointer indirection. Are we ok with that in the
>>> irq?  Also, I don't know why but we have variants to skip tracing (step
>>> 5 above), but on my books a disabled tracepoint is order of magnitudes
>>> less overhead than 1, 2 and 4.
>> 
>> Honestly, I don't really know.
>> 
>> The thing is, we have these ad hoc optimizations all over the place. Why
>> do we have the raw access in two places, but not everywhere in irq
>> handling? The pointer indirection thing really only makes sense if you
>> have a lot of access in a function, but that's not the case. You do have
>> a point about everything else.
>
> The "why only two" places is I think simply an artefact of refactoring 
> and code evolution. Initially all IRQ handling was in one function, then 
> later gen11 and display parts got split out as more platforms were 
> added. For example a3265d851e28 ("drm/i915/irq: Refactor gen11 display 
> interrupt handling").
>
> As for the original rationale, it was described in commits like:
>
> 2e4a5b25886c ("drm/i915: Prune gen8_gt_irq_handler")
> c48a798a7447 ("drm/i915: Trim the ironlake+ irq handler")
>
> Obviosuly, once a portion of a handler was/is extracted, pointer caching 
> to avoid uncore->regs reloads may not make full sense any more due 
> function calls potentially overshadowing that cost.
>
> As for unclaimed debug, I would say it is probably okay to not burden 
> the irq handlers with it, but if the display folks think a little bit of 
> extra cost in this sub-handlers is fine that would sound plausible to me 
> given the frequency of display related interrupts is low. So for me 
> patch is fine if it makes the display decoupling easier.
>
>> What would the interface be like if display were its own module? We
>> couldn't just wrap it all in a bunch of macros and static inlines. Is
>> the end result that display irq handling needs to call functions via
>> pointers in another module? Or do we need to move the register level irq
>> handling to xe and i915 cores, and handle the display parts at a higher
>> abstraction level?
>
> AFAIR no trace variants were not for performance but to avoid log spam 
> when debugging stuff. From things like busy/polling loops.

Bumping a forgotten topic.

Ville, Rodrigo, are we okay with the changes here?

BR,
Jani.

>
> Regards,
>
> Tvrtko
>>>
>>> btw, if we drop the raw accesses, then we can probably drop (1) above.
>>>
>>> Lucas De Marchi
>>>
>>>>
>>>> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
>>>> ---
>>>> drivers/gpu/drm/i915/display/intel_display_irq.c | 15 +++++++--------
>>>> 1 file changed, 7 insertions(+), 8 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/display/intel_display_irq.c b/drivers/gpu/drm/i915/display/intel_display_irq.c
>>>> index f846c5b108b5..d4ae9139be39 100644
>>>> --- a/drivers/gpu/drm/i915/display/intel_display_irq.c
>>>> +++ b/drivers/gpu/drm/i915/display/intel_display_irq.c
>>>> @@ -1148,15 +1148,14 @@ void gen8_de_irq_handler(struct drm_i915_private *dev_priv, u32 master_ctl)
>>>>
>>>> u32 gen11_gu_misc_irq_ack(struct drm_i915_private *i915, const u32 master_ctl)
>>>> {
>>>> -	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
>>>> 	u32 iir;
>>>>
>>>> 	if (!(master_ctl & GEN11_GU_MISC_IRQ))
>>>> 		return 0;
>>>>
>>>> -	iir = raw_reg_read(regs, GEN11_GU_MISC_IIR);
>>>> +	iir = intel_de_read(i915, GEN11_GU_MISC_IIR);
>>>> 	if (likely(iir))
>>>> -		raw_reg_write(regs, GEN11_GU_MISC_IIR, iir);
>>>> +		intel_de_write(i915, GEN11_GU_MISC_IIR, iir);
>>>>
>>>> 	return iir;
>>>> }
>>>> @@ -1169,18 +1168,18 @@ void gen11_gu_misc_irq_handler(struct drm_i915_private *i915, const u32 iir)
>>>>
>>>> void gen11_display_irq_handler(struct drm_i915_private *i915)
>>>> {
>>>> -	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
>>>> -	const u32 disp_ctl = raw_reg_read(regs, GEN11_DISPLAY_INT_CTL);
>>>> +	u32 disp_ctl;
>>>>
>>>> 	disable_rpm_wakeref_asserts(&i915->runtime_pm);
>>>> 	/*
>>>> 	 * GEN11_DISPLAY_INT_CTL has same format as GEN8_MASTER_IRQ
>>>> 	 * for the display related bits.
>>>> 	 */
>>>> -	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL, 0x0);
>>>> +	disp_ctl = intel_de_read(i915, GEN11_DISPLAY_INT_CTL);
>>>> +
>>>> +	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, 0);
>>>> 	gen8_de_irq_handler(i915, disp_ctl);
>>>> -	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL,
>>>> -		      GEN11_DISPLAY_IRQ_ENABLE);
>>>> +	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, GEN11_DISPLAY_IRQ_ENABLE);
>>>>
>>>> 	enable_rpm_wakeref_asserts(&i915->runtime_pm);
>>>> }
>>>> -- 
>>>> 2.39.2
>>>>
>>
Rodrigo Vivi Sept. 17, 2024, 2:02 p.m. UTC | #6
On Tue, Sep 17, 2024 at 01:58:19PM +0300, Jani Nikula wrote:
> On Thu, 18 Apr 2024, Tvrtko Ursulin <tursulin@ursulin.net> wrote:
> > On 18/04/2024 10:49, Jani Nikula wrote:
> >> On Wed, 17 Apr 2024, Lucas De Marchi <lucas.demarchi@intel.com> wrote:
> >>> On Mon, Apr 08, 2024 at 03:54:44PM GMT, Jani Nikula wrote:
> >>>> The raw register reads/writes are there as micro-optimizations to avoid
> >>>> multiple pointer indirections on uncore->regs. Presumably this is useful
> >>>> when there are plenty of register reads/writes in the same
> >>>> function. However, the display irq handling only has a few raw
> >>>> reads/writes. Remove them for simplification.
> >>>
> >>> I think that comment didn't age well. Not to say there's something wrong
> >>> with this commit, but just to make sure we are aware of the additional
> >>> stuff going on and we if we are ok with that.
> >>>
> >>> using intel_de_read() in place of raw_reg_read() will do (for newer
> >>> platforms):
> >>>
> >>> 	1) Read FPGA_DBG to detect unclaimed access before the actual read
> >>> 	2) Find the relevant forcewake for that register, acquire and wait for ack
> >>> 	3) readl(reg)
> >>> 	4) Read FPGA_DBG to detect unclaimed access after the actual read
> >>> 	5) Trace reg rw
> >>>
> >>> That's much more than a pointer indirection. Are we ok with that in the
> >>> irq?  Also, I don't know why but we have variants to skip tracing (step
> >>> 5 above), but on my books a disabled tracepoint is order of magnitudes
> >>> less overhead than 1, 2 and 4.
> >> 
> >> Honestly, I don't really know.
> >> 
> >> The thing is, we have these ad hoc optimizations all over the place. Why
> >> do we have the raw access in two places, but not everywhere in irq
> >> handling? The pointer indirection thing really only makes sense if you
> >> have a lot of access in a function, but that's not the case. You do have
> >> a point about everything else.
> >
> > The "why only two" places is I think simply an artefact of refactoring 
> > and code evolution. Initially all IRQ handling was in one function, then 
> > later gen11 and display parts got split out as more platforms were 
> > added. For example a3265d851e28 ("drm/i915/irq: Refactor gen11 display 
> > interrupt handling").
> >
> > As for the original rationale, it was described in commits like:
> >
> > 2e4a5b25886c ("drm/i915: Prune gen8_gt_irq_handler")

Looking at this one it looks that the raw usage came in place to fix
a macro issue, that we don't have anymore anyway.

> > c48a798a7447 ("drm/i915: Trim the ironlake+ irq handler")

Then, looking at this one, it sounds a good optimization.

> >
> > Obviosuly, once a portion of a handler was/is extracted, pointer caching 
> > to avoid uncore->regs reloads may not make full sense any more due 
> > function calls potentially overshadowing that cost.
> >
> > As for unclaimed debug, I would say it is probably okay to not burden 
> > the irq handlers with it, but if the display folks think a little bit of 
> > extra cost in this sub-handlers is fine that would sound plausible to me 
> > given the frequency of display related interrupts is low. 

Well, looking at the optimization above I always had the initial thought
on the low frequency of display interrupts, because I thought about hotplugs.
But perhaps an optimization in vblank ones would be desireable?

> > So for me 
> > patch is fine if it makes the display decoupling easier.
> >
> >> What would the interface be like if display were its own module? We
> >> couldn't just wrap it all in a bunch of macros and static inlines. Is
> >> the end result that display irq handling needs to call functions via
> >> pointers in another module? Or do we need to move the register level irq
> >> handling to xe and i915 cores, and handle the display parts at a higher
> >> abstraction level?
> >
> > AFAIR no trace variants were not for performance but to avoid log spam 
> > when debugging stuff. From things like busy/polling loops.
> 
> Bumping a forgotten topic.
> 
> Ville, Rodrigo, are we okay with the changes here?

I am in favor of this patch. Let's unify things. But perhaps study if
we need as a follow-up some optimization in vblank or any other display
irq and get that done inside intel_de_ mmio helpers?!

> 
> BR,
> Jani.
> 
> >
> > Regards,
> >
> > Tvrtko
> >>>
> >>> btw, if we drop the raw accesses, then we can probably drop (1) above.
> >>>
> >>> Lucas De Marchi
> >>>
> >>>>
> >>>> Signed-off-by: Jani Nikula <jani.nikula@intel.com>
> >>>> ---
> >>>> drivers/gpu/drm/i915/display/intel_display_irq.c | 15 +++++++--------
> >>>> 1 file changed, 7 insertions(+), 8 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/display/intel_display_irq.c b/drivers/gpu/drm/i915/display/intel_display_irq.c
> >>>> index f846c5b108b5..d4ae9139be39 100644
> >>>> --- a/drivers/gpu/drm/i915/display/intel_display_irq.c
> >>>> +++ b/drivers/gpu/drm/i915/display/intel_display_irq.c
> >>>> @@ -1148,15 +1148,14 @@ void gen8_de_irq_handler(struct drm_i915_private *dev_priv, u32 master_ctl)
> >>>>
> >>>> u32 gen11_gu_misc_irq_ack(struct drm_i915_private *i915, const u32 master_ctl)
> >>>> {
> >>>> -	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
> >>>> 	u32 iir;
> >>>>
> >>>> 	if (!(master_ctl & GEN11_GU_MISC_IRQ))
> >>>> 		return 0;
> >>>>
> >>>> -	iir = raw_reg_read(regs, GEN11_GU_MISC_IIR);
> >>>> +	iir = intel_de_read(i915, GEN11_GU_MISC_IIR);
> >>>> 	if (likely(iir))
> >>>> -		raw_reg_write(regs, GEN11_GU_MISC_IIR, iir);
> >>>> +		intel_de_write(i915, GEN11_GU_MISC_IIR, iir);
> >>>>
> >>>> 	return iir;
> >>>> }
> >>>> @@ -1169,18 +1168,18 @@ void gen11_gu_misc_irq_handler(struct drm_i915_private *i915, const u32 iir)
> >>>>
> >>>> void gen11_display_irq_handler(struct drm_i915_private *i915)
> >>>> {
> >>>> -	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
> >>>> -	const u32 disp_ctl = raw_reg_read(regs, GEN11_DISPLAY_INT_CTL);
> >>>> +	u32 disp_ctl;
> >>>>
> >>>> 	disable_rpm_wakeref_asserts(&i915->runtime_pm);
> >>>> 	/*
> >>>> 	 * GEN11_DISPLAY_INT_CTL has same format as GEN8_MASTER_IRQ
> >>>> 	 * for the display related bits.
> >>>> 	 */
> >>>> -	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL, 0x0);
> >>>> +	disp_ctl = intel_de_read(i915, GEN11_DISPLAY_INT_CTL);
> >>>> +
> >>>> +	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, 0);
> >>>> 	gen8_de_irq_handler(i915, disp_ctl);
> >>>> -	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL,
> >>>> -		      GEN11_DISPLAY_IRQ_ENABLE);
> >>>> +	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, GEN11_DISPLAY_IRQ_ENABLE);
> >>>>
> >>>> 	enable_rpm_wakeref_asserts(&i915->runtime_pm);
> >>>> }
> >>>> -- 
> >>>> 2.39.2
> >>>>
> >> 
> 
> -- 
> Jani Nikula, Intel
Ville Syrjälä Sept. 17, 2024, 2:48 p.m. UTC | #7
On Tue, Sep 17, 2024 at 10:02:19AM -0400, Rodrigo Vivi wrote:
> On Tue, Sep 17, 2024 at 01:58:19PM +0300, Jani Nikula wrote:
> > On Thu, 18 Apr 2024, Tvrtko Ursulin <tursulin@ursulin.net> wrote:
> > > On 18/04/2024 10:49, Jani Nikula wrote:
> > >> On Wed, 17 Apr 2024, Lucas De Marchi <lucas.demarchi@intel.com> wrote:
> > >>> On Mon, Apr 08, 2024 at 03:54:44PM GMT, Jani Nikula wrote:
> > >>>> The raw register reads/writes are there as micro-optimizations to avoid
> > >>>> multiple pointer indirections on uncore->regs. Presumably this is useful
> > >>>> when there are plenty of register reads/writes in the same
> > >>>> function. However, the display irq handling only has a few raw
> > >>>> reads/writes. Remove them for simplification.
> > >>>
> > >>> I think that comment didn't age well. Not to say there's something wrong
> > >>> with this commit, but just to make sure we are aware of the additional
> > >>> stuff going on and we if we are ok with that.
> > >>>
> > >>> using intel_de_read() in place of raw_reg_read() will do (for newer
> > >>> platforms):
> > >>>
> > >>> 	1) Read FPGA_DBG to detect unclaimed access before the actual read
> > >>> 	2) Find the relevant forcewake for that register, acquire and wait for ack
> > >>> 	3) readl(reg)
> > >>> 	4) Read FPGA_DBG to detect unclaimed access after the actual read
> > >>> 	5) Trace reg rw
> > >>>
> > >>> That's much more than a pointer indirection. Are we ok with that in the
> > >>> irq?  Also, I don't know why but we have variants to skip tracing (step
> > >>> 5 above), but on my books a disabled tracepoint is order of magnitudes
> > >>> less overhead than 1, 2 and 4.
> > >> 
> > >> Honestly, I don't really know.
> > >> 
> > >> The thing is, we have these ad hoc optimizations all over the place. Why
> > >> do we have the raw access in two places, but not everywhere in irq
> > >> handling? The pointer indirection thing really only makes sense if you
> > >> have a lot of access in a function, but that's not the case. You do have
> > >> a point about everything else.
> > >
> > > The "why only two" places is I think simply an artefact of refactoring 
> > > and code evolution. Initially all IRQ handling was in one function, then 
> > > later gen11 and display parts got split out as more platforms were 
> > > added. For example a3265d851e28 ("drm/i915/irq: Refactor gen11 display 
> > > interrupt handling").
> > >
> > > As for the original rationale, it was described in commits like:
> > >
> > > 2e4a5b25886c ("drm/i915: Prune gen8_gt_irq_handler")
> 
> Looking at this one it looks that the raw usage came in place to fix
> a macro issue, that we don't have anymore anyway.
> 
> > > c48a798a7447 ("drm/i915: Trim the ironlake+ irq handler")
> 
> Then, looking at this one, it sounds a good optimization.
> 
> > >
> > > Obviosuly, once a portion of a handler was/is extracted, pointer caching 
> > > to avoid uncore->regs reloads may not make full sense any more due 
> > > function calls potentially overshadowing that cost.
> > >
> > > As for unclaimed debug, I would say it is probably okay to not burden 
> > > the irq handlers with it, but if the display folks think a little bit of 
> > > extra cost in this sub-handlers is fine that would sound plausible to me 
> > > given the frequency of display related interrupts is low. 
> 
> Well, looking at the optimization above I always had the initial thought
> on the low frequency of display interrupts, because I thought about hotplugs.
> But perhaps an optimization in vblank ones would be desireable?
> 
> > > So for me 
> > > patch is fine if it makes the display decoupling easier.
> > >
> > >> What would the interface be like if display were its own module? We
> > >> couldn't just wrap it all in a bunch of macros and static inlines. Is
> > >> the end result that display irq handling needs to call functions via
> > >> pointers in another module? Or do we need to move the register level irq
> > >> handling to xe and i915 cores, and handle the display parts at a higher
> > >> abstraction level?
> > >
> > > AFAIR no trace variants were not for performance but to avoid log spam 
> > > when debugging stuff. From things like busy/polling loops.
> > 
> > Bumping a forgotten topic.
> > 
> > Ville, Rodrigo, are we okay with the changes here?
> 
> I am in favor of this patch. Let's unify things. But perhaps study if
> we need as a follow-up some optimization in vblank or any other display
> irq and get that done inside intel_de_ mmio helpers?!

There are probaly a lot of things to think about wrt. 
the register accessors:
- FPGA_DBG only detects display registers IIRC, so
  could perhaps avoid that stuff at build time for GT regs
- forcewake only affects GT so could avoid that 
  completely at build time for display regs
- the gsi_offset stuff could again be bypassed at build
  time for display regs. it generates rather ugly assembly
  with an extra branch for every mmio access. It should really
  be part of the register offset macros, but I guess no one wanted
  to update all of them?
- tracepoints generate a lot of extra junk so all the display
  functions that write a lot of registers look pretty horrendous
  on asm level, and often the compiler just emits a non-inline
  version of intel_de_write_fw() anyway, so you end up with a
  copy in pretty much every compilation unit, in which case it
  would be better to just not have it as a static inline.
- the "access to same cacheline hangs the machine" hw
  issues on hsw era hw needs the spinlock to serialize,
  but I'm not sure if this even affects all the registers
  or just some subset of them

But yeah, if we want to optimize things I think it'd be nice
if we could achieve most of it without so many diffrent ways
of doing this stuff, especially as we've not take any kind of
consistent approach of when to use which mechanism.

I have occasionally thought that maybe all irq handlers should
stick to unlocked accesses as an optimization, but again, not
100% sure if that's safe wrt. the hsw hang issue. It'd also
mean not detecting unclaimed register accesses directly from
irq handlers. I suppose one option would be to the checks
just once around the irq handler?

Anyways, IMO plow ahead for now.
Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Jani Nikula Sept. 19, 2024, 1:14 p.m. UTC | #8
On Tue, 17 Sep 2024, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> Anyways, IMO plow ahead for now.
> Acked-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Thanks for the ack and review, pushed to din.

BR,
Jani.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/display/intel_display_irq.c b/drivers/gpu/drm/i915/display/intel_display_irq.c
index f846c5b108b5..d4ae9139be39 100644
--- a/drivers/gpu/drm/i915/display/intel_display_irq.c
+++ b/drivers/gpu/drm/i915/display/intel_display_irq.c
@@ -1148,15 +1148,14 @@  void gen8_de_irq_handler(struct drm_i915_private *dev_priv, u32 master_ctl)
 
 u32 gen11_gu_misc_irq_ack(struct drm_i915_private *i915, const u32 master_ctl)
 {
-	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
 	u32 iir;
 
 	if (!(master_ctl & GEN11_GU_MISC_IRQ))
 		return 0;
 
-	iir = raw_reg_read(regs, GEN11_GU_MISC_IIR);
+	iir = intel_de_read(i915, GEN11_GU_MISC_IIR);
 	if (likely(iir))
-		raw_reg_write(regs, GEN11_GU_MISC_IIR, iir);
+		intel_de_write(i915, GEN11_GU_MISC_IIR, iir);
 
 	return iir;
 }
@@ -1169,18 +1168,18 @@  void gen11_gu_misc_irq_handler(struct drm_i915_private *i915, const u32 iir)
 
 void gen11_display_irq_handler(struct drm_i915_private *i915)
 {
-	void __iomem * const regs = intel_uncore_regs(&i915->uncore);
-	const u32 disp_ctl = raw_reg_read(regs, GEN11_DISPLAY_INT_CTL);
+	u32 disp_ctl;
 
 	disable_rpm_wakeref_asserts(&i915->runtime_pm);
 	/*
 	 * GEN11_DISPLAY_INT_CTL has same format as GEN8_MASTER_IRQ
 	 * for the display related bits.
 	 */
-	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL, 0x0);
+	disp_ctl = intel_de_read(i915, GEN11_DISPLAY_INT_CTL);
+
+	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, 0);
 	gen8_de_irq_handler(i915, disp_ctl);
-	raw_reg_write(regs, GEN11_DISPLAY_INT_CTL,
-		      GEN11_DISPLAY_IRQ_ENABLE);
+	intel_de_write(i915, GEN11_DISPLAY_INT_CTL, GEN11_DISPLAY_IRQ_ENABLE);
 
 	enable_rpm_wakeref_asserts(&i915->runtime_pm);
 }