diff mbox

[15/17] drm/i915: Increase GuC log buffer size to reduce flush interrupts

Message ID 1468158084-22028-16-git-send-email-akash.goel@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

akash.goel@intel.com July 10, 2016, 1:41 p.m. UTC
From: Akash Goel <akash.goel@intel.com>

In cases where GuC generate logs at a very high rate, correspondingly
the rate of flush interrupts is also very high.
So far total 8 pages were allocated for storing both ISR & DPC logs.
As per the half-full draining protocol followed by GuC, by doubling
the number of pages, the frequency of flush interrupts can be cut down
to almost half, which then helps in reducing the logging overhead.
So now allocating 8 pages apiece for ISR & DPC logs.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/intel_guc_fwif.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Tvrtko Ursulin July 15, 2016, 11:57 a.m. UTC | #1
On 10/07/16 14:41, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
>
> In cases where GuC generate logs at a very high rate, correspondingly
> the rate of flush interrupts is also very high.
> So far total 8 pages were allocated for storing both ISR & DPC logs.
> As per the half-full draining protocol followed by GuC, by doubling
> the number of pages, the frequency of flush interrupts can be cut down
> to almost half, which then helps in reducing the logging overhead.
> So now allocating 8 pages apiece for ISR & DPC logs.
>
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> ---
>   drivers/gpu/drm/i915/intel_guc_fwif.h | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h b/drivers/gpu/drm/i915/intel_guc_fwif.h
> index 1de6928..7521ed5 100644
> --- a/drivers/gpu/drm/i915/intel_guc_fwif.h
> +++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
> @@ -104,9 +104,9 @@
>   #define   GUC_LOG_ALLOC_IN_MEGABYTE	(1 << 3)
>   #define   GUC_LOG_CRASH_PAGES		1
>   #define   GUC_LOG_CRASH_SHIFT		4
> -#define   GUC_LOG_DPC_PAGES		3
> +#define   GUC_LOG_DPC_PAGES		7
>   #define   GUC_LOG_DPC_SHIFT		6
> -#define   GUC_LOG_ISR_PAGES		3
> +#define   GUC_LOG_ISR_PAGES		7
>   #define   GUC_LOG_ISR_SHIFT		9
>   #define   GUC_LOG_BUF_ADDR_SHIFT	12
>
> @@ -436,9 +436,9 @@ enum guc_log_buffer_type {
>    *        |   Crash dump state header     |
>    * Page1  +-------------------------------+
>    *        |           ISR logs            |
> - * Page5  +-------------------------------+
> - *        |           DPC logs            |
>    * Page9  +-------------------------------+
> + *        |           DPC logs            |
> + * Page17 +-------------------------------+
>    *        |         Crash Dump logs       |
>    *        +-------------------------------+
>    *
>

I don't mind - but does it help? And how much and for what? Haven't you 
later found that the uncached reads were the main issue?

Regards,

Tvrtko
akash.goel@intel.com July 15, 2016, 2:42 p.m. UTC | #2
On 7/15/2016 5:27 PM, Tvrtko Ursulin wrote:
>
> On 10/07/16 14:41, akash.goel@intel.com wrote:
>> From: Akash Goel <akash.goel@intel.com>
>>
>> In cases where GuC generate logs at a very high rate, correspondingly
>> the rate of flush interrupts is also very high.
>> So far total 8 pages were allocated for storing both ISR & DPC logs.
>> As per the half-full draining protocol followed by GuC, by doubling
>> the number of pages, the frequency of flush interrupts can be cut down
>> to almost half, which then helps in reducing the logging overhead.
>> So now allocating 8 pages apiece for ISR & DPC logs.
>>
>> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>> ---
>>   drivers/gpu/drm/i915/intel_guc_fwif.h | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h
>> b/drivers/gpu/drm/i915/intel_guc_fwif.h
>> index 1de6928..7521ed5 100644
>> --- a/drivers/gpu/drm/i915/intel_guc_fwif.h
>> +++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
>> @@ -104,9 +104,9 @@
>>   #define   GUC_LOG_ALLOC_IN_MEGABYTE    (1 << 3)
>>   #define   GUC_LOG_CRASH_PAGES        1
>>   #define   GUC_LOG_CRASH_SHIFT        4
>> -#define   GUC_LOG_DPC_PAGES        3
>> +#define   GUC_LOG_DPC_PAGES        7
>>   #define   GUC_LOG_DPC_SHIFT        6
>> -#define   GUC_LOG_ISR_PAGES        3
>> +#define   GUC_LOG_ISR_PAGES        7
>>   #define   GUC_LOG_ISR_SHIFT        9
>>   #define   GUC_LOG_BUF_ADDR_SHIFT    12
>>
>> @@ -436,9 +436,9 @@ enum guc_log_buffer_type {
>>    *        |   Crash dump state header     |
>>    * Page1  +-------------------------------+
>>    *        |           ISR logs            |
>> - * Page5  +-------------------------------+
>> - *        |           DPC logs            |
>>    * Page9  +-------------------------------+
>> + *        |           DPC logs            |
>> + * Page17 +-------------------------------+
>>    *        |         Crash Dump logs       |
>>    *        +-------------------------------+
>>    *
>>
>
> I don't mind - but does it help? And how much and for what? Haven't you
> later found that the uncached reads were the main issue?
This change along with kthread patch, helped reduce the overflow counts 
and even eliminate them for some benchmarks.

Though with the impending optimization for Uncached reads there should 
be further improvements but in my view, notwithstanding the improvement 
w.r.t overflow count, its still a better configuration to work with as 
flush interrupt frequency is cut down to half and not able to see any 
apparent downsides to it.

Best Regards
Akash
>
> Regards,
>
> Tvrtko
Tvrtko Ursulin July 15, 2016, 3:07 p.m. UTC | #3
On 15/07/16 15:42, Goel, Akash wrote:
> On 7/15/2016 5:27 PM, Tvrtko Ursulin wrote:
>>
>> On 10/07/16 14:41, akash.goel@intel.com wrote:
>>> From: Akash Goel <akash.goel@intel.com>
>>>
>>> In cases where GuC generate logs at a very high rate, correspondingly
>>> the rate of flush interrupts is also very high.
>>> So far total 8 pages were allocated for storing both ISR & DPC logs.
>>> As per the half-full draining protocol followed by GuC, by doubling
>>> the number of pages, the frequency of flush interrupts can be cut down
>>> to almost half, which then helps in reducing the logging overhead.
>>> So now allocating 8 pages apiece for ISR & DPC logs.
>>>
>>> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/intel_guc_fwif.h | 8 ++++----
>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h
>>> b/drivers/gpu/drm/i915/intel_guc_fwif.h
>>> index 1de6928..7521ed5 100644
>>> --- a/drivers/gpu/drm/i915/intel_guc_fwif.h
>>> +++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
>>> @@ -104,9 +104,9 @@
>>>   #define   GUC_LOG_ALLOC_IN_MEGABYTE    (1 << 3)
>>>   #define   GUC_LOG_CRASH_PAGES        1
>>>   #define   GUC_LOG_CRASH_SHIFT        4
>>> -#define   GUC_LOG_DPC_PAGES        3
>>> +#define   GUC_LOG_DPC_PAGES        7
>>>   #define   GUC_LOG_DPC_SHIFT        6
>>> -#define   GUC_LOG_ISR_PAGES        3
>>> +#define   GUC_LOG_ISR_PAGES        7
>>>   #define   GUC_LOG_ISR_SHIFT        9
>>>   #define   GUC_LOG_BUF_ADDR_SHIFT    12
>>>
>>> @@ -436,9 +436,9 @@ enum guc_log_buffer_type {
>>>    *        |   Crash dump state header     |
>>>    * Page1  +-------------------------------+
>>>    *        |           ISR logs            |
>>> - * Page5  +-------------------------------+
>>> - *        |           DPC logs            |
>>>    * Page9  +-------------------------------+
>>> + *        |           DPC logs            |
>>> + * Page17 +-------------------------------+
>>>    *        |         Crash Dump logs       |
>>>    *        +-------------------------------+
>>>    *
>>>
>>
>> I don't mind - but does it help? And how much and for what? Haven't you
>> later found that the uncached reads were the main issue?
> This change along with kthread patch, helped reduce the overflow counts
> and even eliminate them for some benchmarks.
>
> Though with the impending optimization for Uncached reads there should
> be further improvements but in my view, notwithstanding the improvement
> w.r.t overflow count, its still a better configuration to work with as
> flush interrupt frequency is cut down to half and not able to see any
> apparent downsides to it.

I was primarily thinking to go with a minimal and simplest set of 
patches to implement the feature.

Logic was that apparently none of the smart and complex optimisations 
managed to solve the dropped interrupt issue, until the slowness of the 
uncached read was discovered to be the real/main issue.

So it seems that is something that definitely needs to be implemented. 
(Whether or not it will be possible to use SSE instructions to do the 
read I don't know.)

Assuming it is possible, then the question is whether there is need for 
all the other optimisations. Ie. do we need the kthread with rtprio or 
would a simple worker be enough? Do we need the new i915 param for 
tweaking the relay sub-buffers? Do we need the increase of the log 
buffer size? The extra patch to do smarter reads?

If we do not have the issue of the dropped interrupts with none of these 
extra patches applied, then we could afford to not bother with them now. 
Would make the series shorter and review easier and the feature in quicker.

Or maybe we do need all the advanced stuff, I don't know, I am just 
asking the question and would like to see some data.

Regards,

Tvrtko
akash.goel@intel.com July 15, 2016, 4:20 p.m. UTC | #4
On 7/15/2016 8:37 PM, Tvrtko Ursulin wrote:
>
> On 15/07/16 15:42, Goel, Akash wrote:
>> On 7/15/2016 5:27 PM, Tvrtko Ursulin wrote:
>>>
>>> On 10/07/16 14:41, akash.goel@intel.com wrote:
>>>> From: Akash Goel <akash.goel@intel.com>
>>>>
>>>> In cases where GuC generate logs at a very high rate, correspondingly
>>>> the rate of flush interrupts is also very high.
>>>> So far total 8 pages were allocated for storing both ISR & DPC logs.
>>>> As per the half-full draining protocol followed by GuC, by doubling
>>>> the number of pages, the frequency of flush interrupts can be cut down
>>>> to almost half, which then helps in reducing the logging overhead.
>>>> So now allocating 8 pages apiece for ISR & DPC logs.
>>>>
>>>> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/intel_guc_fwif.h | 8 ++++----
>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>> b/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>> index 1de6928..7521ed5 100644
>>>> --- a/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>> +++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>> @@ -104,9 +104,9 @@
>>>>   #define   GUC_LOG_ALLOC_IN_MEGABYTE    (1 << 3)
>>>>   #define   GUC_LOG_CRASH_PAGES        1
>>>>   #define   GUC_LOG_CRASH_SHIFT        4
>>>> -#define   GUC_LOG_DPC_PAGES        3
>>>> +#define   GUC_LOG_DPC_PAGES        7
>>>>   #define   GUC_LOG_DPC_SHIFT        6
>>>> -#define   GUC_LOG_ISR_PAGES        3
>>>> +#define   GUC_LOG_ISR_PAGES        7
>>>>   #define   GUC_LOG_ISR_SHIFT        9
>>>>   #define   GUC_LOG_BUF_ADDR_SHIFT    12
>>>>
>>>> @@ -436,9 +436,9 @@ enum guc_log_buffer_type {
>>>>    *        |   Crash dump state header     |
>>>>    * Page1  +-------------------------------+
>>>>    *        |           ISR logs            |
>>>> - * Page5  +-------------------------------+
>>>> - *        |           DPC logs            |
>>>>    * Page9  +-------------------------------+
>>>> + *        |           DPC logs            |
>>>> + * Page17 +-------------------------------+
>>>>    *        |         Crash Dump logs       |
>>>>    *        +-------------------------------+
>>>>    *
>>>>
>>>
>>> I don't mind - but does it help? And how much and for what? Haven't you
>>> later found that the uncached reads were the main issue?
>> This change along with kthread patch, helped reduce the overflow counts
>> and even eliminate them for some benchmarks.
>>
>> Though with the impending optimization for Uncached reads there should
>> be further improvements but in my view, notwithstanding the improvement
>> w.r.t overflow count, its still a better configuration to work with as
>> flush interrupt frequency is cut down to half and not able to see any
>> apparent downsides to it.
>
> I was primarily thinking to go with a minimal and simplest set of
> patches to implement the feature.
>
I second that and working with the same intent.

> Logic was that apparently none of the smart and complex optimisations
> managed to solve the dropped interrupt issue, until the slowness of the
> uncached read was discovered to be the real/main issue.
>
> So it seems that is something that definitely needs to be implemented.
> (Whether or not it will be possible to use SSE instructions to do the
> read I don't know.)
>

log buffer resizing and rt priority kthread changes have definitely 
helped significantly.

Only of late we realized that there is a potential way to speed up 
Uncached reads also. Moreover I am yet to test that on kernel side.
So until that is tested & proves to be enough, we have to rely on the 
other optimizations & can't dismiss them

> Assuming it is possible, then the question is whether there is need for
> all the other optimisations. Ie. do we need the kthread with rtprio or
> would a simple worker be enough?
I think we can take a call, once we have the results with Uncached read 
optimization.

> Do we need the new i915 param for tweaking the relay sub-buffers?
In my opinion it will be really useful to have this provision, as I
tried to explain in the other mail.

> Do we need the increase of the log buffer size?
Though this seems to be a benign change which is definitely good to 
have, but again can decide upon it once we have the results.

The extra patch to do smarter reads?
>
> If we do not have the issue of the dropped interrupts with none of these
> extra patches applied, then we could afford to not bother with them now.
> Would make the series shorter and review easier and the feature in quicker.
>
Agree with you.
Had none of these optimizations in the initial version of the series, 
but was compelled to add them later when realized the rate at which GuC 
was generating the logs.

Best regards
Akash

> Or maybe we do need all the advanced stuff, I don't know, I am just
> asking the question and would like to see some data.
>
> Regards,
>
> Tvrtko
Tvrtko Ursulin July 18, 2016, 9:54 a.m. UTC | #5
On 15/07/16 17:20, Goel, Akash wrote:
> On 7/15/2016 8:37 PM, Tvrtko Ursulin wrote:
>> On 15/07/16 15:42, Goel, Akash wrote:
>>> On 7/15/2016 5:27 PM, Tvrtko Ursulin wrote:
>>>>
>>>> On 10/07/16 14:41, akash.goel@intel.com wrote:
>>>>> From: Akash Goel <akash.goel@intel.com>
>>>>>
>>>>> In cases where GuC generate logs at a very high rate, correspondingly
>>>>> the rate of flush interrupts is also very high.
>>>>> So far total 8 pages were allocated for storing both ISR & DPC logs.
>>>>> As per the half-full draining protocol followed by GuC, by doubling
>>>>> the number of pages, the frequency of flush interrupts can be cut down
>>>>> to almost half, which then helps in reducing the logging overhead.
>>>>> So now allocating 8 pages apiece for ISR & DPC logs.
>>>>>
>>>>> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/intel_guc_fwif.h | 8 ++++----
>>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>>> b/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>>> index 1de6928..7521ed5 100644
>>>>> --- a/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>>> +++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>>> @@ -104,9 +104,9 @@
>>>>>   #define   GUC_LOG_ALLOC_IN_MEGABYTE    (1 << 3)
>>>>>   #define   GUC_LOG_CRASH_PAGES        1
>>>>>   #define   GUC_LOG_CRASH_SHIFT        4
>>>>> -#define   GUC_LOG_DPC_PAGES        3
>>>>> +#define   GUC_LOG_DPC_PAGES        7
>>>>>   #define   GUC_LOG_DPC_SHIFT        6
>>>>> -#define   GUC_LOG_ISR_PAGES        3
>>>>> +#define   GUC_LOG_ISR_PAGES        7
>>>>>   #define   GUC_LOG_ISR_SHIFT        9
>>>>>   #define   GUC_LOG_BUF_ADDR_SHIFT    12
>>>>>
>>>>> @@ -436,9 +436,9 @@ enum guc_log_buffer_type {
>>>>>    *        |   Crash dump state header     |
>>>>>    * Page1  +-------------------------------+
>>>>>    *        |           ISR logs            |
>>>>> - * Page5  +-------------------------------+
>>>>> - *        |           DPC logs            |
>>>>>    * Page9  +-------------------------------+
>>>>> + *        |           DPC logs            |
>>>>> + * Page17 +-------------------------------+
>>>>>    *        |         Crash Dump logs       |
>>>>>    *        +-------------------------------+
>>>>>    *
>>>>>
>>>>
>>>> I don't mind - but does it help? And how much and for what? Haven't you
>>>> later found that the uncached reads were the main issue?
>>> This change along with kthread patch, helped reduce the overflow counts
>>> and even eliminate them for some benchmarks.
>>>
>>> Though with the impending optimization for Uncached reads there should
>>> be further improvements but in my view, notwithstanding the improvement
>>> w.r.t overflow count, its still a better configuration to work with as
>>> flush interrupt frequency is cut down to half and not able to see any
>>> apparent downsides to it.
>>
>> I was primarily thinking to go with a minimal and simplest set of
>> patches to implement the feature.
>>
> I second that and working with the same intent.
>
>> Logic was that apparently none of the smart and complex optimisations
>> managed to solve the dropped interrupt issue, until the slowness of the
>> uncached read was discovered to be the real/main issue.
>>
>> So it seems that is something that definitely needs to be implemented.
>> (Whether or not it will be possible to use SSE instructions to do the
>> read I don't know.)
>>
>
> log buffer resizing and rt priority kthread changes have definitely
> helped significantly.
>
> Only of late we realized that there is a potential way to speed up
> Uncached reads also. Moreover I am yet to test that on kernel side.
> So until that is tested & proves to be enough, we have to rely on the
> other optimizations & can't dismiss them

Maybe, depends if, what I thought was the case, none of the other 
optimizations actually enabled a drop-free logging in all interesting 
scenarios.

If we conclude that simply improving the copy speed removes the need for 
any other optimisations and complications, we can talk about whether 
every individual one of those still makes sense.

>> Assuming it is possible, then the question is whether there is need for
>> all the other optimisations. Ie. do we need the kthread with rtprio or
>> would a simple worker be enough?
> I think we can take a call, once we have the results with Uncached read
> optimization.

Agreed. Lets see how that works out and the discuss on how the final 
series should look like.

Regards,

Tvrtko
akash.goel@intel.com July 18, 2016, 12:35 p.m. UTC | #6
On 7/18/2016 3:24 PM, Tvrtko Ursulin wrote:
>
> On 15/07/16 17:20, Goel, Akash wrote:
>> On 7/15/2016 8:37 PM, Tvrtko Ursulin wrote:
>>> On 15/07/16 15:42, Goel, Akash wrote:
>>>> On 7/15/2016 5:27 PM, Tvrtko Ursulin wrote:
>>>>>
>>>>> On 10/07/16 14:41, akash.goel@intel.com wrote:
>>>>>> From: Akash Goel <akash.goel@intel.com>
>>>>>>
>>>>>> In cases where GuC generate logs at a very high rate, correspondingly
>>>>>> the rate of flush interrupts is also very high.
>>>>>> So far total 8 pages were allocated for storing both ISR & DPC logs.
>>>>>> As per the half-full draining protocol followed by GuC, by doubling
>>>>>> the number of pages, the frequency of flush interrupts can be cut
>>>>>> down
>>>>>> to almost half, which then helps in reducing the logging overhead.
>>>>>> So now allocating 8 pages apiece for ISR & DPC logs.
>>>>>>
>>>>>> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/i915/intel_guc_fwif.h | 8 ++++----
>>>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>>>> b/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>>>> index 1de6928..7521ed5 100644
>>>>>> --- a/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>>>> +++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>>>> @@ -104,9 +104,9 @@
>>>>>>   #define   GUC_LOG_ALLOC_IN_MEGABYTE    (1 << 3)
>>>>>>   #define   GUC_LOG_CRASH_PAGES        1
>>>>>>   #define   GUC_LOG_CRASH_SHIFT        4
>>>>>> -#define   GUC_LOG_DPC_PAGES        3
>>>>>> +#define   GUC_LOG_DPC_PAGES        7
>>>>>>   #define   GUC_LOG_DPC_SHIFT        6
>>>>>> -#define   GUC_LOG_ISR_PAGES        3
>>>>>> +#define   GUC_LOG_ISR_PAGES        7
>>>>>>   #define   GUC_LOG_ISR_SHIFT        9
>>>>>>   #define   GUC_LOG_BUF_ADDR_SHIFT    12
>>>>>>
>>>>>> @@ -436,9 +436,9 @@ enum guc_log_buffer_type {
>>>>>>    *        |   Crash dump state header     |
>>>>>>    * Page1  +-------------------------------+
>>>>>>    *        |           ISR logs            |
>>>>>> - * Page5  +-------------------------------+
>>>>>> - *        |           DPC logs            |
>>>>>>    * Page9  +-------------------------------+
>>>>>> + *        |           DPC logs            |
>>>>>> + * Page17 +-------------------------------+
>>>>>>    *        |         Crash Dump logs       |
>>>>>>    *        +-------------------------------+
>>>>>>    *
>>>>>>
>>>>>
>>>>> I don't mind - but does it help? And how much and for what? Haven't
>>>>> you
>>>>> later found that the uncached reads were the main issue?
>>>> This change along with kthread patch, helped reduce the overflow counts
>>>> and even eliminate them for some benchmarks.
>>>>
>>>> Though with the impending optimization for Uncached reads there should
>>>> be further improvements but in my view, notwithstanding the improvement
>>>> w.r.t overflow count, its still a better configuration to work with as
>>>> flush interrupt frequency is cut down to half and not able to see any
>>>> apparent downsides to it.
>>>
>>> I was primarily thinking to go with a minimal and simplest set of
>>> patches to implement the feature.
>>>
>> I second that and working with the same intent.
>>
>>> Logic was that apparently none of the smart and complex optimisations
>>> managed to solve the dropped interrupt issue, until the slowness of the
>>> uncached read was discovered to be the real/main issue.
>>>
>>> So it seems that is something that definitely needs to be implemented.
>>> (Whether or not it will be possible to use SSE instructions to do the
>>> read I don't know.)
>>>
>>
>> log buffer resizing and rt priority kthread changes have definitely
>> helped significantly.
>>
>> Only of late we realized that there is a potential way to speed up
>> Uncached reads also. Moreover I am yet to test that on kernel side.
>> So until that is tested & proves to be enough, we have to rely on the
>> other optimizations & can't dismiss them
>
> Maybe, depends if, what I thought was the case, none of the other
> optimizations actually enabled a drop-free logging in all interesting
> scenarios.
>
> If we conclude that simply improving the copy speed removes the need for
> any other optimisations and complications, we can talk about whether
> every individual one of those still makes sense.
>
In my opinion we should keep this change, regardless of the copying 
speed up. Moreover this is a straight forward change.

Actually this also helps in reducing the output log file size, apart
from reducing the flush interrupt count.
With the original settings, 44 KB was needed for one snapshot.
With the modified settings, 76 KB is needed for one snapshot but it
will be equivalent to 2 snapshots of the original setting.
So 12KB saving, every 88 KB, over the original setting.

Best regards
Akash

>>> Assuming it is possible, then the question is whether there is need for
>>> all the other optimisations. Ie. do we need the kthread with rtprio or
>>> would a simple worker be enough?
>> I think we can take a call, once we have the results with Uncached read
>> optimization.
>
> Agreed. Lets see how that works out and the discuss on how the final
> series should look like.
>
> Regards,
>
> Tvrtko
Tvrtko Ursulin July 18, 2016, 1:08 p.m. UTC | #7
On 18/07/16 13:35, Goel, Akash wrote:
>
>
> On 7/18/2016 3:24 PM, Tvrtko Ursulin wrote:
>>
>> On 15/07/16 17:20, Goel, Akash wrote:
>>> On 7/15/2016 8:37 PM, Tvrtko Ursulin wrote:
>>>> On 15/07/16 15:42, Goel, Akash wrote:
>>>>> On 7/15/2016 5:27 PM, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 10/07/16 14:41, akash.goel@intel.com wrote:
>>>>>>> From: Akash Goel <akash.goel@intel.com>
>>>>>>>
>>>>>>> In cases where GuC generate logs at a very high rate,
>>>>>>> correspondingly
>>>>>>> the rate of flush interrupts is also very high.
>>>>>>> So far total 8 pages were allocated for storing both ISR & DPC logs.
>>>>>>> As per the half-full draining protocol followed by GuC, by doubling
>>>>>>> the number of pages, the frequency of flush interrupts can be cut
>>>>>>> down
>>>>>>> to almost half, which then helps in reducing the logging overhead.
>>>>>>> So now allocating 8 pages apiece for ISR & DPC logs.
>>>>>>>
>>>>>>> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>>>>>>> ---
>>>>>>>   drivers/gpu/drm/i915/intel_guc_fwif.h | 8 ++++----
>>>>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>>>>> b/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>>>>> index 1de6928..7521ed5 100644
>>>>>>> --- a/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>>>>> +++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
>>>>>>> @@ -104,9 +104,9 @@
>>>>>>>   #define   GUC_LOG_ALLOC_IN_MEGABYTE    (1 << 3)
>>>>>>>   #define   GUC_LOG_CRASH_PAGES        1
>>>>>>>   #define   GUC_LOG_CRASH_SHIFT        4
>>>>>>> -#define   GUC_LOG_DPC_PAGES        3
>>>>>>> +#define   GUC_LOG_DPC_PAGES        7
>>>>>>>   #define   GUC_LOG_DPC_SHIFT        6
>>>>>>> -#define   GUC_LOG_ISR_PAGES        3
>>>>>>> +#define   GUC_LOG_ISR_PAGES        7
>>>>>>>   #define   GUC_LOG_ISR_SHIFT        9
>>>>>>>   #define   GUC_LOG_BUF_ADDR_SHIFT    12
>>>>>>>
>>>>>>> @@ -436,9 +436,9 @@ enum guc_log_buffer_type {
>>>>>>>    *        |   Crash dump state header     |
>>>>>>>    * Page1  +-------------------------------+
>>>>>>>    *        |           ISR logs            |
>>>>>>> - * Page5  +-------------------------------+
>>>>>>> - *        |           DPC logs            |
>>>>>>>    * Page9  +-------------------------------+
>>>>>>> + *        |           DPC logs            |
>>>>>>> + * Page17 +-------------------------------+
>>>>>>>    *        |         Crash Dump logs       |
>>>>>>>    *        +-------------------------------+
>>>>>>>    *
>>>>>>>
>>>>>>
>>>>>> I don't mind - but does it help? And how much and for what? Haven't
>>>>>> you
>>>>>> later found that the uncached reads were the main issue?
>>>>> This change along with kthread patch, helped reduce the overflow
>>>>> counts
>>>>> and even eliminate them for some benchmarks.
>>>>>
>>>>> Though with the impending optimization for Uncached reads there should
>>>>> be further improvements but in my view, notwithstanding the
>>>>> improvement
>>>>> w.r.t overflow count, its still a better configuration to work with as
>>>>> flush interrupt frequency is cut down to half and not able to see any
>>>>> apparent downsides to it.
>>>>
>>>> I was primarily thinking to go with a minimal and simplest set of
>>>> patches to implement the feature.
>>>>
>>> I second that and working with the same intent.
>>>
>>>> Logic was that apparently none of the smart and complex optimisations
>>>> managed to solve the dropped interrupt issue, until the slowness of the
>>>> uncached read was discovered to be the real/main issue.
>>>>
>>>> So it seems that is something that definitely needs to be implemented.
>>>> (Whether or not it will be possible to use SSE instructions to do the
>>>> read I don't know.)
>>>>
>>>
>>> log buffer resizing and rt priority kthread changes have definitely
>>> helped significantly.
>>>
>>> Only of late we realized that there is a potential way to speed up
>>> Uncached reads also. Moreover I am yet to test that on kernel side.
>>> So until that is tested & proves to be enough, we have to rely on the
>>> other optimizations & can't dismiss them
>>
>> Maybe, depends if, what I thought was the case, none of the other
>> optimizations actually enabled a drop-free logging in all interesting
>> scenarios.
>>
>> If we conclude that simply improving the copy speed removes the need for
>> any other optimisations and complications, we can talk about whether
>> every individual one of those still makes sense.
>>
> In my opinion we should keep this change, regardless of the copying
> speed up. Moreover this is a straight forward change.
>
> Actually this also helps in reducing the output log file size, apart
> from reducing the flush interrupt count.
> With the original settings, 44 KB was needed for one snapshot.
> With the modified settings, 76 KB is needed for one snapshot but it
> will be equivalent to 2 snapshots of the original setting.
> So 12KB saving, every 88 KB, over the original setting.

That is indeed a good benefit. Did not realize there is this space 
wastage problem.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_guc_fwif.h b/drivers/gpu/drm/i915/intel_guc_fwif.h
index 1de6928..7521ed5 100644
--- a/drivers/gpu/drm/i915/intel_guc_fwif.h
+++ b/drivers/gpu/drm/i915/intel_guc_fwif.h
@@ -104,9 +104,9 @@ 
 #define   GUC_LOG_ALLOC_IN_MEGABYTE	(1 << 3)
 #define   GUC_LOG_CRASH_PAGES		1
 #define   GUC_LOG_CRASH_SHIFT		4
-#define   GUC_LOG_DPC_PAGES		3
+#define   GUC_LOG_DPC_PAGES		7
 #define   GUC_LOG_DPC_SHIFT		6
-#define   GUC_LOG_ISR_PAGES		3
+#define   GUC_LOG_ISR_PAGES		7
 #define   GUC_LOG_ISR_SHIFT		9
 #define   GUC_LOG_BUF_ADDR_SHIFT	12
 
@@ -436,9 +436,9 @@  enum guc_log_buffer_type {
  *        |   Crash dump state header     |
  * Page1  +-------------------------------+
  *        |           ISR logs            |
- * Page5  +-------------------------------+
- *        |           DPC logs            |
  * Page9  +-------------------------------+
+ *        |           DPC logs            |
+ * Page17 +-------------------------------+
  *        |         Crash Dump logs       |
  *        +-------------------------------+
  *