mbox series

[RFC,0/1] Splitting up platform-specific calls

Message ID 20220120221652.207255-1-casey.g.bowman@intel.com (mailing list archive)
Headers show
Series Splitting up platform-specific calls | expand

Message

Bowman, Casey G Jan. 20, 2022, 10:16 p.m. UTC
In this RFC I would like to ask the community their thoughts
on how we can best handle splitting architecture-specific
calls.

I would like to address the following:

1. How do we want to split architecture calls? Different object files
per platform? Separate function calls within the same object file?

2. How do we address dummy functions? If we have a function call that is
used for one or more platforms, but is not used in another, what should
we do for this case?

I've given an example of splitting an architecture call
in my patch with run_as_guest() being split into different
implementations for x86 and arm64 in separate object files, sharing
a single header.

Another suggestion from Michael (michael.cheng@intel.com) involved
using a single object file, a single header, and splitting various
functions calls via ifdefs in the header file.

I would appreciate any input on how we can avoid scaling issues when
including multiple architectures and multiple functions (as the number
of function calls will inevitably increase with more architectures).

Casey Bowman (1):
  i915/drm: Split out x86 and arm64 functionality

 drivers/gpu/drm/i915/Makefile              |  4 +++
 drivers/gpu/drm/i915/i915_drv.h            |  6 +---
 drivers/gpu/drm/i915/i915_platform.h       | 16 +++++++++++
 drivers/gpu/drm/i915/i915_platform_arm64.c | 33 ++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_platform_x86.c   | 33 ++++++++++++++++++++++
 5 files changed, 87 insertions(+), 5 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_platform.h
 create mode 100644 drivers/gpu/drm/i915/i915_platform_arm64.c
 create mode 100644 drivers/gpu/drm/i915/i915_platform_x86.c

Comments

Bowman, Casey G Feb. 3, 2022, 11:58 p.m. UTC | #1
CC'ing more reviewers for comments.

On 1/20/22 14:16, Casey Bowman wrote:
> In this RFC I would like to ask the community their thoughts
> on how we can best handle splitting architecture-specific
> calls.
>
> I would like to address the following:
>
> 1. How do we want to split architecture calls? Different object files
> per platform? Separate function calls within the same object file?
>
> 2. How do we address dummy functions? If we have a function call that is
> used for one or more platforms, but is not used in another, what should
> we do for this case?
>
> I've given an example of splitting an architecture call
> in my patch with run_as_guest() being split into different
> implementations for x86 and arm64 in separate object files, sharing
> a single header.
>
> Another suggestion from Michael (michael.cheng@intel.com) involved
> using a single object file, a single header, and splitting various
> functions calls via ifdefs in the header file.
>
> I would appreciate any input on how we can avoid scaling issues when
> including multiple architectures and multiple functions (as the number
> of function calls will inevitably increase with more architectures).
>
> Casey Bowman (1):
>    i915/drm: Split out x86 and arm64 functionality
>
>   drivers/gpu/drm/i915/Makefile              |  4 +++
>   drivers/gpu/drm/i915/i915_drv.h            |  6 +---
>   drivers/gpu/drm/i915/i915_platform.h       | 16 +++++++++++
>   drivers/gpu/drm/i915/i915_platform_arm64.c | 33 ++++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_platform_x86.c   | 33 ++++++++++++++++++++++
>   5 files changed, 87 insertions(+), 5 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/i915_platform.h
>   create mode 100644 drivers/gpu/drm/i915/i915_platform_arm64.c
>   create mode 100644 drivers/gpu/drm/i915/i915_platform_x86.c
>
Jani Nikula Feb. 7, 2022, 1:02 p.m. UTC | #2
On Thu, 20 Jan 2022, Casey Bowman <casey.g.bowman@intel.com> wrote:
> In this RFC I would like to ask the community their thoughts
> on how we can best handle splitting architecture-specific
> calls.
>
> I would like to address the following:
>
> 1. How do we want to split architecture calls? Different object files
> per platform? Separate function calls within the same object file?
>
> 2. How do we address dummy functions? If we have a function call that is
> used for one or more platforms, but is not used in another, what should
> we do for this case?
>
> I've given an example of splitting an architecture call
> in my patch with run_as_guest() being split into different
> implementations for x86 and arm64 in separate object files, sharing
> a single header.
>
> Another suggestion from Michael (michael.cheng@intel.com) involved
> using a single object file, a single header, and splitting various
> functions calls via ifdefs in the header file.
>
> I would appreciate any input on how we can avoid scaling issues when
> including multiple architectures and multiple functions (as the number
> of function calls will inevitably increase with more architectures).

Personally I think the functionality is best kept organized by, well,
functionality, not by platform. Otherwise the platform files will
contain all sorts of code with the only common denominator being the
platform.

You're also likely to have static platform specific functions, which
would needlessly have to be made non-static if the split was by files.

I'd just put the implementations for different platforms next to each
other:

#if IS_ENABLED(CONFIG_X86)
...
#elif IS_ENABLED(CONFIG_ARM64)
...
#endif

Usually the declarations would be identical and there'd only be one,
without #ifs in the header.


BR,
Jani.

>
> Casey Bowman (1):
>   i915/drm: Split out x86 and arm64 functionality
>
>  drivers/gpu/drm/i915/Makefile              |  4 +++
>  drivers/gpu/drm/i915/i915_drv.h            |  6 +---
>  drivers/gpu/drm/i915/i915_platform.h       | 16 +++++++++++
>  drivers/gpu/drm/i915/i915_platform_arm64.c | 33 ++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_platform_x86.c   | 33 ++++++++++++++++++++++
>  5 files changed, 87 insertions(+), 5 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_platform.h
>  create mode 100644 drivers/gpu/drm/i915/i915_platform_arm64.c
>  create mode 100644 drivers/gpu/drm/i915/i915_platform_x86.c
Tvrtko Ursulin Feb. 7, 2022, 3:36 p.m. UTC | #3
On 20/01/2022 22:16, Casey Bowman wrote:
> In this RFC I would like to ask the community their thoughts
> on how we can best handle splitting architecture-specific
> calls.
> 
> I would like to address the following:
> 
> 1. How do we want to split architecture calls? Different object files
> per platform? Separate function calls within the same object file?

If we are talking about per-platform divergence of significant functions (not necessarily in size but their height position in the i915 stack) I agree with Jani that top-level per platform organisation is not the best choice.

On the other hand I doubt that there should be many, if any, such functions. In practice I think it should be only low level stuff which diverges.

On a concrete example..
  
> 2. How do we address dummy functions? If we have a function call that is
> used for one or more platforms, but is not used in another, what should
> we do for this case?

... depends on the situation. Sometimes a flavour of "warn on once" can be okay I guess, but also why not build bug on? Because..

> 
> I've given an example of splitting an architecture call
> in my patch with run_as_guest() being split into different
> implementations for x86 and arm64 in separate object files, sharing
> a single header.

... run_as_guest may be a very tricky example, given it is called from intel_vtd_active which has a number of callers.

What is correct behaviour on Arm in this example? None of these call sites will run on Arm? Or that you expect the warn on added in this patch to trigger as a demonstration? If so, what is the plan going forward? We can take one example and talk about it hypothetically:

./i915_driver.c:        drm_printf(p, "iommu: %s\n", enableddisabled(intel_vtd_active(i915)));

What is the "fix" (refactor) for Arm here? Looks like a new top-level function is needed which does not carry the intel_vtd_ prefix but something more generic. That one could then legitimately "warn on once", while for intel_vtd_active it would be wrong to do so.

And when I say it is needed.. well perhaps it is not strictly needed in this case, but in some other cases I think we go back to the problem I stated some months ago and that is that I suspect use of intel_vtd_active is overloaded. I think it is currently used to answer all these questions: 1. Is the IOMMU active, just for information.; 2. Is the IOMMU active and we want to counteract the performance hit by say using huge pages, adjusting the display bandwidth calculations or whatever. (In which case we also may want to distinguish between pass-through and translation modes.); 3. Is a potentially buggy IOMMU active and we need to work around it. All these under one kind of worked with one iommu implementation but does it with a different IOMMU?

Which I mean leads to end conclusion that this particular function is a tricky example to answer the questions asked. :)

> 
> Another suggestion from Michael (michael.cheng@intel.com) involved
> using a single object file, a single header, and splitting various
> functions calls via ifdefs in the header file.

In principle, mostly what you have outlined sounds acceptable to me, with the difference that I would not use i915_platform, but for this particular example something like i915_hypervisor prefix.

Then I would prepare i915 with the same scheme kernel uses, not just for source file divergence, but header file as well. That is:

some_source.c:

#include "i915_hypervisor.h"

i915_hypervisor.h:

#include "platform/i915_hypervisor.h"

Then in i915 root you could have:

platforms/x86/include/platform/i915_hypervisor.h
platforms/arm/include/platform/i915_hypervisor.h

And some kbuild stuff to make that work. Is this doable and does it make sense? Per-platform source files could live in there as well.

Same scheme for i915_clflush would work as well.

Regards,

Tvrtko

> 
> I would appreciate any input on how we can avoid scaling issues when
> including multiple architectures and multiple functions (as the number
> of function calls will inevitably increase with more architectures).
> 
> Casey Bowman (1):
>    i915/drm: Split out x86 and arm64 functionality
> 
>   drivers/gpu/drm/i915/Makefile              |  4 +++
>   drivers/gpu/drm/i915/i915_drv.h            |  6 +---
>   drivers/gpu/drm/i915/i915_platform.h       | 16 +++++++++++
>   drivers/gpu/drm/i915/i915_platform_arm64.c | 33 ++++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_platform_x86.c   | 33 ++++++++++++++++++++++
>   5 files changed, 87 insertions(+), 5 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/i915_platform.h
>   create mode 100644 drivers/gpu/drm/i915/i915_platform_arm64.c
>   create mode 100644 drivers/gpu/drm/i915/i915_platform_x86.c
>
Bowman, Casey G Feb. 9, 2022, 5:07 a.m. UTC | #4
On 2/7/22 05:02, Jani Nikula wrote:
> On Thu, 20 Jan 2022, Casey Bowman <casey.g.bowman@intel.com> wrote:
>> In this RFC I would like to ask the community their thoughts
>> on how we can best handle splitting architecture-specific
>> calls.
>>
>> I would like to address the following:
>>
>> 1. How do we want to split architecture calls? Different object files
>> per platform? Separate function calls within the same object file?
>>
>> 2. How do we address dummy functions? If we have a function call that is
>> used for one or more platforms, but is not used in another, what should
>> we do for this case?
>>
>> I've given an example of splitting an architecture call
>> in my patch with run_as_guest() being split into different
>> implementations for x86 and arm64 in separate object files, sharing
>> a single header.
>>
>> Another suggestion from Michael (michael.cheng@intel.com) involved
>> using a single object file, a single header, and splitting various
>> functions calls via ifdefs in the header file.
>>
>> I would appreciate any input on how we can avoid scaling issues when
>> including multiple architectures and multiple functions (as the number
>> of function calls will inevitably increase with more architectures).
> Personally I think the functionality is best kept organized by, well,
> functionality, not by platform. Otherwise the platform files will
> contain all sorts of code with the only common denominator being the
> platform.
>
> You're also likely to have static platform specific functions, which
> would needlessly have to be made non-static if the split was by files.
>
> I'd just put the implementations for different platforms next to each
> other:
>
> #if IS_ENABLED(CONFIG_X86)
> ...
> #elif IS_ENABLED(CONFIG_ARM64)
> ...
> #endif
>
> Usually the declarations would be identical and there'd only be one,
> without #ifs in the header.

Thanks for the feedback, Jani.

I think this is the proper takeaway for future functions that do have
separate implementations for differing architectures.

As for null behavior, as in the example I gave, I think Tvrtko is right
about run_as_guest being a tricky example. I think I need to
re-evaluate that function and think of another solution altogether
for that instance.

I think this will also be the precedent for null cases, where we will
need to rethink implementations for cases that don't really have
some arch-specific implementation other than returning null
(though some exceptions may exist).

>
> BR,
> Jani.
>
>> Casey Bowman (1):
>>    i915/drm: Split out x86 and arm64 functionality
>>
>>   drivers/gpu/drm/i915/Makefile              |  4 +++
>>   drivers/gpu/drm/i915/i915_drv.h            |  6 +---
>>   drivers/gpu/drm/i915/i915_platform.h       | 16 +++++++++++
>>   drivers/gpu/drm/i915/i915_platform_arm64.c | 33 ++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_platform_x86.c   | 33 ++++++++++++++++++++++
>>   5 files changed, 87 insertions(+), 5 deletions(-)
>>   create mode 100644 drivers/gpu/drm/i915/i915_platform.h
>>   create mode 100644 drivers/gpu/drm/i915/i915_platform_arm64.c
>>   create mode 100644 drivers/gpu/drm/i915/i915_platform_x86.c
Bowman, Casey G Feb. 9, 2022, 5:25 a.m. UTC | #5
On 2/7/22 07:36, Tvrtko Ursulin wrote:
>
> On 20/01/2022 22:16, Casey Bowman wrote:
>> In this RFC I would like to ask the community their thoughts
>> on how we can best handle splitting architecture-specific
>> calls.
>>
>> I would like to address the following:
>>
>> 1. How do we want to split architecture calls? Different object files
>> per platform? Separate function calls within the same object file?
>
> If we are talking about per-platform divergence of significant 
> functions (not necessarily in size but their height position in the 
> i915 stack) I agree with Jani that top-level per platform organisation 
> is not the best choice.
>
> On the other hand I doubt that there should be many, if any, such 
> functions. In practice I think it should be only low level stuff which 
> diverges.
I agree, as said with my reply to Jani, I think maybe going forward for 
arch-specific code, #if IS_ENABLED calls should be used sparingly, only 
in the cases where we do have that arch-specific implementation (like 
low level calls), instead of just a 'return null', as in my case.
>
> On a concrete example..
>
>> 2. How do we address dummy functions? If we have a function call that is
>> used for one or more platforms, but is not used in another, what should
>> we do for this case?
>
> ... depends on the situation. Sometimes a flavour of "warn on once" 
> can be okay I guess, but also why not build bug on? Because..
True, with Jani's and your comments, I'm thinking that in the case of 
when we look at a potential arch-specific function where we're just 
returning null or something similar, we should be re-evaluating the 
function and seeing if we should make a different change there.
>
>>
>> I've given an example of splitting an architecture call
>> in my patch with run_as_guest() being split into different
>> implementations for x86 and arm64 in separate object files, sharing
>> a single header.
>
> ... run_as_guest may be a very tricky example, given it is called from 
> intel_vtd_active which has a number of callers.
>
> What is correct behaviour on Arm in this example? None of these call 
> sites will run on Arm? Or that you expect the warn on added in this 
> patch to trigger as a demonstration? If so, what is the plan going 
> forward? We can take one example and talk about it hypothetically:
>
> ./i915_driver.c:        drm_printf(p, "iommu: %s\n", 
> enableddisabled(intel_vtd_active(i915)));
>
> What is the "fix" (refactor) for Arm here? Looks like a new top-level 
> function is needed which does not carry the intel_vtd_ prefix but 
> something more generic. That one could then legitimately "warn on 
> once", while for intel_vtd_active it would be wrong to do so.

Good point, run_as_guest might be a bit more challenging of an example.

In my mind, I was thinking of just simply returning null for arm64 here 
as I don't believe arm64 would be making use of iommu for our cases (at 
least, in the short term).
I think this example function specifically needs to get reworked, as you 
mentioned, in some way, possibly refactoring intel_vtd_active or 
something along those lines.

>
> And when I say it is needed.. well perhaps it is not strictly needed 
> in this case, but in some other cases I think we go back to the 
> problem I stated some months ago and that is that I suspect use of 
> intel_vtd_active is overloaded. I think it is currently used to answer 
> all these questions: 1. Is the IOMMU active, just for information.; 2. 
> Is the IOMMU active and we want to counteract the performance hit by 
> say using huge pages, adjusting the display bandwidth calculations or 
> whatever. (In which case we also may want to distinguish between 
> pass-through and translation modes.); 3. Is a potentially buggy IOMMU 
> active and we need to work around it. All these under one kind of 
> worked with one iommu implementation but does it with a different IOMMU?
>
> Which I mean leads to end conclusion that this particular function is 
> a tricky example to answer the questions asked. :)
>
>>
>> Another suggestion from Michael (michael.cheng@intel.com) involved
>> using a single object file, a single header, and splitting various
>> functions calls via ifdefs in the header file.
>
> In principle, mostly what you have outlined sounds acceptable to me, 
> with the difference that I would not use i915_platform, but for this 
> particular example something like i915_hypervisor prefix.
>
> Then I would prepare i915 with the same scheme kernel uses, not just 
> for source file divergence, but header file as well. That is:
>
> some_source.c:
>
> #include "i915_hypervisor.h"
>
> i915_hypervisor.h:
>
> #include "platform/i915_hypervisor.h"
>
> Then in i915 root you could have:
>
> platforms/x86/include/platform/i915_hypervisor.h
> platforms/arm/include/platform/i915_hypervisor.h
>
> And some kbuild stuff to make that work. Is this doable and does it 
> make sense? Per-platform source files could live in there as well.
>
> Same scheme for i915_clflush would work as well.

I like the idea of getting more specific for the calls here, but I'm 
somewhat afraid of obfuscating these functions to their own files in 
addition to scaling issues if we do have more and more arch-specific 
calls, along with more architectures in the future.

This just seems like it could blow up the driver in what could 
ultimately be unnecessary organization for a fewer number of calls if we 
just add a few platforms and different calls.
What do you think?

Regards,
Casey

>
> Regards,
>
> Tvrtko
>
>>
>> I would appreciate any input on how we can avoid scaling issues when
>> including multiple architectures and multiple functions (as the number
>> of function calls will inevitably increase with more architectures).
>>
>> Casey Bowman (1):
>>    i915/drm: Split out x86 and arm64 functionality
>>
>>   drivers/gpu/drm/i915/Makefile              |  4 +++
>>   drivers/gpu/drm/i915/i915_drv.h            |  6 +---
>>   drivers/gpu/drm/i915/i915_platform.h       | 16 +++++++++++
>>   drivers/gpu/drm/i915/i915_platform_arm64.c | 33 ++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_platform_x86.c   | 33 ++++++++++++++++++++++
>>   5 files changed, 87 insertions(+), 5 deletions(-)
>>   create mode 100644 drivers/gpu/drm/i915/i915_platform.h
>>   create mode 100644 drivers/gpu/drm/i915/i915_platform_arm64.c
>>   create mode 100644 drivers/gpu/drm/i915/i915_platform_x86.c
>>
Tvrtko Ursulin Feb. 10, 2022, 11:10 a.m. UTC | #6
On 09/02/2022 05:25, Casey Bowman wrote:
> 
> On 2/7/22 07:36, Tvrtko Ursulin wrote:
>>
>> On 20/01/2022 22:16, Casey Bowman wrote:
>>> In this RFC I would like to ask the community their thoughts
>>> on how we can best handle splitting architecture-specific
>>> calls.
>>>
>>> I would like to address the following:
>>>
>>> 1. How do we want to split architecture calls? Different object files
>>> per platform? Separate function calls within the same object file?
>>
>> If we are talking about per-platform divergence of significant 
>> functions (not necessarily in size but their height position in the 
>> i915 stack) I agree with Jani that top-level per platform organisation 
>> is not the best choice.
>>
>> On the other hand I doubt that there should be many, if any, such 
>> functions. In practice I think it should be only low level stuff which 
>> diverges.
> I agree, as said with my reply to Jani, I think maybe going forward for 
> arch-specific code, #if IS_ENABLED calls should be used sparingly, only 
> in the cases where we do have that arch-specific implementation (like 
> low level calls), instead of just a 'return null', as in my case.
>>
>> On a concrete example..
>>
>>> 2. How do we address dummy functions? If we have a function call that is
>>> used for one or more platforms, but is not used in another, what should
>>> we do for this case?
>>
>> ... depends on the situation. Sometimes a flavour of "warn on once" 
>> can be okay I guess, but also why not build bug on? Because..
> True, with Jani's and your comments, I'm thinking that in the case of 
> when we look at a potential arch-specific function where we're just 
> returning null or something similar, we should be re-evaluating the 
> function and seeing if we should make a different change there.
>>
>>>
>>> I've given an example of splitting an architecture call
>>> in my patch with run_as_guest() being split into different
>>> implementations for x86 and arm64 in separate object files, sharing
>>> a single header.
>>
>> ... run_as_guest may be a very tricky example, given it is called from 
>> intel_vtd_active which has a number of callers.
>>
>> What is correct behaviour on Arm in this example? None of these call 
>> sites will run on Arm? Or that you expect the warn on added in this 
>> patch to trigger as a demonstration? If so, what is the plan going 
>> forward? We can take one example and talk about it hypothetically:
>>
>> ./i915_driver.c:        drm_printf(p, "iommu: %s\n", 
>> enableddisabled(intel_vtd_active(i915)));
>>
>> What is the "fix" (refactor) for Arm here? Looks like a new top-level 
>> function is needed which does not carry the intel_vtd_ prefix but 
>> something more generic. That one could then legitimately "warn on 
>> once", while for intel_vtd_active it would be wrong to do so.
> 
> Good point, run_as_guest might be a bit more challenging of an example.
> 
> In my mind, I was thinking of just simply returning null for arm64 here 
> as I don't believe arm64 would be making use of iommu for our cases (at 
> least, in the short term).
> I think this example function specifically needs to get reworked, as you 
> mentioned, in some way, possibly refactoring intel_vtd_active or 
> something along those lines.
> 
>>
>> And when I say it is needed.. well perhaps it is not strictly needed 
>> in this case, but in some other cases I think we go back to the 
>> problem I stated some months ago and that is that I suspect use of 
>> intel_vtd_active is overloaded. I think it is currently used to answer 
>> all these questions: 1. Is the IOMMU active, just for information.; 2. 
>> Is the IOMMU active and we want to counteract the performance hit by 
>> say using huge pages, adjusting the display bandwidth calculations or 
>> whatever. (In which case we also may want to distinguish between 
>> pass-through and translation modes.); 3. Is a potentially buggy IOMMU 
>> active and we need to work around it. All these under one kind of 
>> worked with one iommu implementation but does it with a different IOMMU?
>>
>> Which I mean leads to end conclusion that this particular function is 
>> a tricky example to answer the questions asked. :)
>>
>>>
>>> Another suggestion from Michael (michael.cheng@intel.com) involved
>>> using a single object file, a single header, and splitting various
>>> functions calls via ifdefs in the header file.
>>
>> In principle, mostly what you have outlined sounds acceptable to me, 
>> with the difference that I would not use i915_platform, but for this 
>> particular example something like i915_hypervisor prefix.
>>
>> Then I would prepare i915 with the same scheme kernel uses, not just 
>> for source file divergence, but header file as well. That is:
>>
>> some_source.c:
>>
>> #include "i915_hypervisor.h"
>>
>> i915_hypervisor.h:
>>
>> #include "platform/i915_hypervisor.h"
>>
>> Then in i915 root you could have:
>>
>> platforms/x86/include/platform/i915_hypervisor.h
>> platforms/arm/include/platform/i915_hypervisor.h
>>
>> And some kbuild stuff to make that work. Is this doable and does it 
>> make sense? Per-platform source files could live in there as well.
>>
>> Same scheme for i915_clflush would work as well.
> 
> I like the idea of getting more specific for the calls here, but I'm 
> somewhat afraid of obfuscating these functions to their own files in 
> addition to scaling issues if we do have more and more arch-specific 
> calls, along with more architectures in the future.
> 
> This just seems like it could blow up the driver in what could 
> ultimately be unnecessary organization for a fewer number of calls if we 
> just add a few platforms and different calls.
> What do you think?

I don't have a good crystal ball to see how many of these you would end 
up during the Arm porting effort, just my gut feeling that it shouldn't 
be a problem to add a few files.

And I don't think it would be obfuscating anything, on the contrary it 
makes things very clear in respect where the platform dependent boundary 
is and what are all the bits that diverge per platforms.

Personally I would just do this all as part of your other series which 
touches clflush and in that way establish a pattern from the start. Plus 
that way I sleep easy not thinking how we started penalizing old 
platforms with needless new function calls on hot paths. But it is my 
opinion only and other people may think differently.

Regards,

Tvrtko
Bowman, Casey G Feb. 10, 2022, 8:12 p.m. UTC | #7
On 2/10/22 03:10, Tvrtko Ursulin wrote:
>
> On 09/02/2022 05:25, Casey Bowman wrote:
>>
>> On 2/7/22 07:36, Tvrtko Ursulin wrote:
>>>
>>> On 20/01/2022 22:16, Casey Bowman wrote:
>>>> In this RFC I would like to ask the community their thoughts
>>>> on how we can best handle splitting architecture-specific
>>>> calls.
>>>>
>>>> I would like to address the following:
>>>>
>>>> 1. How do we want to split architecture calls? Different object files
>>>> per platform? Separate function calls within the same object file?
>>>
>>> If we are talking about per-platform divergence of significant 
>>> functions (not necessarily in size but their height position in the 
>>> i915 stack) I agree with Jani that top-level per platform 
>>> organisation is not the best choice.
>>>
>>> On the other hand I doubt that there should be many, if any, such 
>>> functions. In practice I think it should be only low level stuff 
>>> which diverges.
>> I agree, as said with my reply to Jani, I think maybe going forward 
>> for arch-specific code, #if IS_ENABLED calls should be used 
>> sparingly, only in the cases where we do have that arch-specific 
>> implementation (like low level calls), instead of just a 'return 
>> null', as in my case.
>>>
>>> On a concrete example..
>>>
>>>> 2. How do we address dummy functions? If we have a function call 
>>>> that is
>>>> used for one or more platforms, but is not used in another, what 
>>>> should
>>>> we do for this case?
>>>
>>> ... depends on the situation. Sometimes a flavour of "warn on once" 
>>> can be okay I guess, but also why not build bug on? Because..
>> True, with Jani's and your comments, I'm thinking that in the case of 
>> when we look at a potential arch-specific function where we're just 
>> returning null or something similar, we should be re-evaluating the 
>> function and seeing if we should make a different change there.
>>>
>>>>
>>>> I've given an example of splitting an architecture call
>>>> in my patch with run_as_guest() being split into different
>>>> implementations for x86 and arm64 in separate object files, sharing
>>>> a single header.
>>>
>>> ... run_as_guest may be a very tricky example, given it is called 
>>> from intel_vtd_active which has a number of callers.
>>>
>>> What is correct behaviour on Arm in this example? None of these call 
>>> sites will run on Arm? Or that you expect the warn on added in this 
>>> patch to trigger as a demonstration? If so, what is the plan going 
>>> forward? We can take one example and talk about it hypothetically:
>>>
>>> ./i915_driver.c:        drm_printf(p, "iommu: %s\n", 
>>> enableddisabled(intel_vtd_active(i915)));
>>>
>>> What is the "fix" (refactor) for Arm here? Looks like a new 
>>> top-level function is needed which does not carry the intel_vtd_ 
>>> prefix but something more generic. That one could then legitimately 
>>> "warn on once", while for intel_vtd_active it would be wrong to do so.
>>
>> Good point, run_as_guest might be a bit more challenging of an example.
>>
>> In my mind, I was thinking of just simply returning null for arm64 
>> here as I don't believe arm64 would be making use of iommu for our 
>> cases (at least, in the short term).
>> I think this example function specifically needs to get reworked, as 
>> you mentioned, in some way, possibly refactoring intel_vtd_active or 
>> something along those lines.
>>
>>>
>>> And when I say it is needed.. well perhaps it is not strictly needed 
>>> in this case, but in some other cases I think we go back to the 
>>> problem I stated some months ago and that is that I suspect use of 
>>> intel_vtd_active is overloaded. I think it is currently used to 
>>> answer all these questions: 1. Is the IOMMU active, just for 
>>> information.; 2. Is the IOMMU active and we want to counteract the 
>>> performance hit by say using huge pages, adjusting the display 
>>> bandwidth calculations or whatever. (In which case we also may want 
>>> to distinguish between pass-through and translation modes.); 3. Is a 
>>> potentially buggy IOMMU active and we need to work around it. All 
>>> these under one kind of worked with one iommu implementation but 
>>> does it with a different IOMMU?
>>>
>>> Which I mean leads to end conclusion that this particular function 
>>> is a tricky example to answer the questions asked. :)
>>>
>>>>
>>>> Another suggestion from Michael (michael.cheng@intel.com) involved
>>>> using a single object file, a single header, and splitting various
>>>> functions calls via ifdefs in the header file.
>>>
>>> In principle, mostly what you have outlined sounds acceptable to me, 
>>> with the difference that I would not use i915_platform, but for this 
>>> particular example something like i915_hypervisor prefix.
>>>
>>> Then I would prepare i915 with the same scheme kernel uses, not just 
>>> for source file divergence, but header file as well. That is:
>>>
>>> some_source.c:
>>>
>>> #include "i915_hypervisor.h"
>>>
>>> i915_hypervisor.h:
>>>
>>> #include "platform/i915_hypervisor.h"
>>>
>>> Then in i915 root you could have:
>>>
>>> platforms/x86/include/platform/i915_hypervisor.h
>>> platforms/arm/include/platform/i915_hypervisor.h
>>>
>>> And some kbuild stuff to make that work. Is this doable and does it 
>>> make sense? Per-platform source files could live in there as well.
>>>
>>> Same scheme for i915_clflush would work as well.
>>
>> I like the idea of getting more specific for the calls here, but I'm 
>> somewhat afraid of obfuscating these functions to their own files in 
>> addition to scaling issues if we do have more and more arch-specific 
>> calls, along with more architectures in the future.
>>
>> This just seems like it could blow up the driver in what could 
>> ultimately be unnecessary organization for a fewer number of calls if 
>> we just add a few platforms and different calls.
>> What do you think?
>
> I don't have a good crystal ball to see how many of these you would 
> end up during the Arm porting effort, just my gut feeling that it 
> shouldn't be a problem to add a few files.
>
> And I don't think it would be obfuscating anything, on the contrary it 
> makes things very clear in respect where the platform dependent 
> boundary is and what are all the bits that diverge per platforms.

Fair enough, I'll look into writing up something as an update for 
further review.

>
> Personally I would just do this all as part of your other series which 
> touches clflush and in that way establish a pattern from the start. 
> Plus that way I sleep easy not thinking how we started penalizing old 
> platforms with needless new function calls on hot paths. But it is my 
> opinion only and other people may think differently.

I believe that series is authored by Michael, not myself :P
But I do agree with trying to establishing a precedent here for future, 
similar calls.
It would definitely be helpful to get more opinions for consensus on 
this, as we know there's many ways to go about this.

In the meantime, to form a basic guideline that others can help mold, I 
believe the precedent as of the moment for porting i915 functionality 
for other architectures is:
- If the functionality has low-level code that must be ported to another 
arch, use #if IS_ENABLED to split out the function calls by arch config 
(this should be sparingly used).
- If the functionality is specific to x86 architecture (has library 
calls that only pertain to x86), split out the call to a platform 
dependent area (so long as there isn't an already-existing generic call 
available that can be used in a refactor).

Does this sort of guideline make sense/sound correct? I want to make 
something we can reference when porting various pieces of functionality 
throughout the driver.

Regards,
Casey